Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] RESOURCE_DOES_NOT_EXIST when mlflow call start_run()

See original GitHub issue

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 - AWS EC2
MLflow installed from (source or binary): conda
MLflow version (run mlflow --version): mlflow, version 1.20.2
Python version: 3.6.9
npm version, if running the dev UI:
Exact command to reproduce: mlflow.start_run()

Describe the problem

I have remote tracking server (the access policies for EC2 to server are setted correct, but I’m not sure at 100%). I have a main run (parent), and under that parent I also have a few child runs. The issue is related to first start_run() (parent run). When the script calls with mlflow.start_run(), script crashes.

The resposne from server calls: RESOURCE_DOES_NOT_EXIST when looking for run_id

Code to reproduce issue

remote_server_uri = "http://x.x.x.x:xxxx" # set to your server URI
    mlflow.set_tracking_uri(remote_server_uri)
    mlflow.set_experiment('/cargo_movement')
    # You can get the path at the root of the MLflow project with this:
    root_path = os.path.abspath('.')

    # Check which steps we need to execute
    if isinstance(config["main"]["execute_steps"], str):
        # This was passed on the command line as a comma-separated list of steps
        steps_to_execute = config["main"]["execute_steps"].split(",")
    else:

        steps_to_execute = list(config["main"]["execute_steps"])
    
    with mlflow.start_run() as parent_run:
        # Download step
        if "1_download" in steps_to_execute:

            _ = mlflow.run(
                os.path.join(root_path, "1_download"),
                "main",
                parameters={
                    "parent_run_id": parent_run.info.run_id,
                }
            )
        ...

Other info / logs

$ mlflow run .
2021/09/18 13:30:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpy661fhzb for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 13:30:47 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'f7b8bafb58404dcb8e27ae1b901b2524' === 
ENV VAR: f7b8bafb58404dcb8e27ae1b901b2524
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
    active_run_obj = client.get_run(existing_run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/client.py", line 150, in get_run
    return self._tracking_client.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 65, in get_run
    return self.store.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 132, in get_run
    response_proto = self._call_endpoint(GetRun, req_body)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 217, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 169, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run with id=f7b8bafb58404dcb8e27ae1b901b2524 not found
2021/09/18 13:30:48 ERROR mlflow.cli: === Run (ID 'f7b8bafb58404dcb8e27ae1b901b2524') failed ===

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:14 (1 by maintainers)

Top GitHub Comments

2reactions

Clayriseecommented, May 9, 2022

mlflow run . --experiment_name=“some-experiment-name” --tracking_uri=“some-tracking-uri”

There is no parameter called --tracking_uri The parameter --experiment_name should be --experiment-name Unfortunately this does not work for me, I tried to remove with mlflow.start_run() and keep mlflow.set_tracking_uri() in the code

Well, you can try this step.

Export MLFLow Tracking Server variable like this code below.

export MLFLOW_TRACKING_URI=your_tracking_uri
export MLFLOW_EXPERIMENT_NAME="your_experiment_name"

Run your MLflow Project with this command line.

mlflow run [your/where/MLproject Folder] --no-conda # if you don't want use conda env

Notes:

You must remove mlflow.start_run() in your python code, if you don’t remove this line it will create 2 running experiments and create errors
You don’t have to use mlflow.set_tracking_uri(), because it is already set in your environment variables.

Hope it will work for you!

2reactions

Jakubelocommented, Sep 18, 2021

When calling script without mlflow.set_tracking_uri(remote_server_uri) then i get:

mlflow run .
2021/09/18 14:06:55 INFO mlflow.projects.utils: === Created directory /tmp/tmphpullqjs for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 14:06:55 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'ac0582aec6a44f19899f5dfcba02cc39' === 
INFO: 'cargo_movement' does not exist. Creating a new experiment
ENV VAR: ac0582aec6a44f19899f5dfcba02cc39
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 210, in start_run
    raise MlflowException(
mlflow.exceptions.MlflowException: Cannot start run with ID ac0582aec6a44f19899f5dfcba02cc39 because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
2021/09/18 14:06:56 ERROR mlflow.cli: === Run (ID 'ac0582aec6a44f19899f5dfcba02cc39') failed ===

Top Results From Across the Web

Source code for mlflow.tracking.fluent

To start a new run, first end the " + "current run with mlflow.end_run(). To start a nested " + "run, call start_run...

MLflow 2.0.1 documentation

The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current...

R API — MLflow 2.0.1 documentation

To terminate a daemonized server, call httpuv::stopDaemonizedServer() with the handle returned from this call. browse. Launch browser with serving landing page?

R API — MLflow 0.7.0 documentation

“Fresh” here means that they should be declared in the call to crate() . ... NULL, or a function to call for every...

MLflow Tracking — MLflow 2.0.1 documentation

If you do not specify an experiment in mlflow.start_run() , new runs are ... You do not need to call start_run explicitly: calling...