[BUG] RESOURCE_DOES_NOT_EXIST when mlflow call start_run()
See original GitHub issueWillingness to contribute
The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?
- Yes. I can contribute a fix for this bug independently.
- Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
- No. I cannot contribute a bug fix at this time.
System information
- Have I written custom code (as opposed to using a stock example script provided in MLflow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 - AWS EC2
- MLflow installed from (source or binary): conda
- MLflow version (run
mlflow --version
): mlflow, version 1.20.2 - Python version: 3.6.9
- npm version, if running the dev UI:
- Exact command to reproduce: mlflow.start_run()
Describe the problem
I have remote tracking server (the access policies for EC2 to server are setted correct, but I’m not sure at 100%).
I have a main run (parent), and under that parent I also have a few child runs. The issue is related to first start_run() (parent run). When the script calls with mlflow.start_run()
, script crashes.
The resposne from server calls: RESOURCE_DOES_NOT_EXIST
when looking for run_id
Code to reproduce issue
remote_server_uri = "http://x.x.x.x:xxxx" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment('/cargo_movement')
# You can get the path at the root of the MLflow project with this:
root_path = os.path.abspath('.')
# Check which steps we need to execute
if isinstance(config["main"]["execute_steps"], str):
# This was passed on the command line as a comma-separated list of steps
steps_to_execute = config["main"]["execute_steps"].split(",")
else:
steps_to_execute = list(config["main"]["execute_steps"])
with mlflow.start_run() as parent_run:
# Download step
if "1_download" in steps_to_execute:
_ = mlflow.run(
os.path.join(root_path, "1_download"),
"main",
parameters={
"parent_run_id": parent_run.info.run_id,
}
)
...
Other info / logs
$ mlflow run .
2021/09/18 13:30:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpy661fhzb for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 13:30:47 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'f7b8bafb58404dcb8e27ae1b901b2524' ===
ENV VAR: f7b8bafb58404dcb8e27ae1b901b2524
Traceback (most recent call last):
File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
go(config)
File "/home/ubuntu/fchardnet/main.py", line 25, in go
with mlflow.start_run() as parent_run:
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
active_run_obj = client.get_run(existing_run_id)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/client.py", line 150, in get_run
return self._tracking_client.get_run(run_id)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 65, in get_run
return self.store.get_run(run_id)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 132, in get_run
response_proto = self._call_endpoint(GetRun, req_body)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 217, in call_endpoint
response = verify_rest_response(response, endpoint)
File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 169, in verify_rest_response
raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run with id=f7b8bafb58404dcb8e27ae1b901b2524 not found
2021/09/18 13:30:48 ERROR mlflow.cli: === Run (ID 'f7b8bafb58404dcb8e27ae1b901b2524') failed ===
What component(s), interfaces, languages, and integrations does this bug affect?
Components
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
Interface
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
Language
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
Integrations
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:14 (1 by maintainers)
Top Results From Across the Web
Source code for mlflow.tracking.fluent
To start a new run, first end the " + "current run with mlflow.end_run(). To start a nested " + "run, call start_run...
Read more >MLflow 2.0.1 documentation
The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current...
Read more >R API — MLflow 2.0.1 documentation
To terminate a daemonized server, call httpuv::stopDaemonizedServer() with the handle returned from this call. browse. Launch browser with serving landing page?
Read more >R API — MLflow 0.7.0 documentation
“Fresh” here means that they should be declared in the call to crate() . ... NULL, or a function to call for every...
Read more >MLflow Tracking — MLflow 2.0.1 documentation
If you do not specify an experiment in mlflow.start_run() , new runs are ... You do not need to call start_run explicitly: calling...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Well, you can try this step.
Notes:
mlflow.set_tracking_uri()
, because it is already set in your environment variables.Hope it will work for you!
When calling script without mlflow.set_tracking_uri(remote_server_uri) then i get: