question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] RESOURCE_DOES_NOT_EXIST when mlflow call start_run()

See original GitHub issue

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 - AWS EC2
  • MLflow installed from (source or binary): conda
  • MLflow version (run mlflow --version): mlflow, version 1.20.2
  • Python version: 3.6.9
  • npm version, if running the dev UI:
  • Exact command to reproduce: mlflow.start_run()

Describe the problem

I have remote tracking server (the access policies for EC2 to server are setted correct, but I’m not sure at 100%). I have a main run (parent), and under that parent I also have a few child runs. The issue is related to first start_run() (parent run). When the script calls with mlflow.start_run(), script crashes.

The resposne from server calls: RESOURCE_DOES_NOT_EXIST when looking for run_id

Code to reproduce issue

remote_server_uri = "http://x.x.x.x:xxxx" # set to your server URI
    mlflow.set_tracking_uri(remote_server_uri)
    mlflow.set_experiment('/cargo_movement')
    # You can get the path at the root of the MLflow project with this:
    root_path = os.path.abspath('.')

    # Check which steps we need to execute
    if isinstance(config["main"]["execute_steps"], str):
        # This was passed on the command line as a comma-separated list of steps
        steps_to_execute = config["main"]["execute_steps"].split(",")
    else:

        steps_to_execute = list(config["main"]["execute_steps"])
    
    with mlflow.start_run() as parent_run:
        # Download step
        if "1_download" in steps_to_execute:

            _ = mlflow.run(
                os.path.join(root_path, "1_download"),
                "main",
                parameters={
                    "parent_run_id": parent_run.info.run_id,
                }
            )
        ...

Other info / logs

$ mlflow run .
2021/09/18 13:30:47 INFO mlflow.projects.utils: === Created directory /tmp/tmpy661fhzb for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 13:30:47 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'f7b8bafb58404dcb8e27ae1b901b2524' === 
ENV VAR: f7b8bafb58404dcb8e27ae1b901b2524
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 204, in start_run
    active_run_obj = client.get_run(existing_run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/client.py", line 150, in get_run
    return self._tracking_client.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 65, in get_run
    return self.store.get_run(run_id)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 132, in get_run
    response_proto = self._call_endpoint(GetRun, req_body)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 217, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 169, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: RESOURCE_DOES_NOT_EXIST: Run with id=f7b8bafb58404dcb8e27ae1b901b2524 not found
2021/09/18 13:30:48 ERROR mlflow.cli: === Run (ID 'f7b8bafb58404dcb8e27ae1b901b2524') failed ===

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:3
  • Comments:14 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
Clayriseecommented, May 9, 2022

mlflow run . --experiment_name=“some-experiment-name” --tracking_uri=“some-tracking-uri”

There is no parameter called --tracking_uri The parameter --experiment_name should be --experiment-name Unfortunately this does not work for me, I tried to remove with mlflow.start_run() and keep mlflow.set_tracking_uri() in the code

Well, you can try this step.

  1. Export MLFLow Tracking Server variable like this code below.
export MLFLOW_TRACKING_URI=your_tracking_uri
export MLFLOW_EXPERIMENT_NAME="your_experiment_name"
  1. Run your MLflow Project with this command line.
mlflow run [your/where/MLproject Folder] --no-conda # if you don't want use conda env

Notes:

  • You must remove mlflow.start_run() in your python code, if you don’t remove this line it will create 2 running experiments and create errors
  • You don’t have to use mlflow.set_tracking_uri(), because it is already set in your environment variables.

Hope it will work for you!

2reactions
Jakubelocommented, Sep 18, 2021

When calling script without mlflow.set_tracking_uri(remote_server_uri) then i get:

mlflow run .
2021/09/18 14:06:55 INFO mlflow.projects.utils: === Created directory /tmp/tmphpullqjs for downloading remote URIs passed to arguments of type 'path' ===
2021/09/18 14:06:55 INFO mlflow.projects.backend.local: === Running command 'source /home/ubuntu/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-167823303a9c0913bc4240ea63b3cb92329b0538 1>&2 && python main.py' in run with ID 'ac0582aec6a44f19899f5dfcba02cc39' === 
INFO: 'cargo_movement' does not exist. Creating a new experiment
ENV VAR: ac0582aec6a44f19899f5dfcba02cc39
Traceback (most recent call last):
  File "/home/ubuntu/fchardnet/main.py", line 109, in <module>
    go(config)
  File "/home/ubuntu/fchardnet/main.py", line 25, in go
    with mlflow.start_run() as parent_run:
  File "/home/ubuntu/anaconda3/envs/mlflow-167823303a9c0913bc4240ea63b3cb92329b0538/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 210, in start_run
    raise MlflowException(
mlflow.exceptions.MlflowException: Cannot start run with ID ac0582aec6a44f19899f5dfcba02cc39 because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
2021/09/18 14:06:56 ERROR mlflow.cli: === Run (ID 'ac0582aec6a44f19899f5dfcba02cc39') failed ===
Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for mlflow.tracking.fluent
To start a new run, first end the " + "current run with mlflow.end_run(). To start a nested " + "run, call start_run...
Read more >
MLflow 2.0.1 documentation
The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current...
Read more >
R API — MLflow 2.0.1 documentation
To terminate a daemonized server, call httpuv::stopDaemonizedServer() with the handle returned from this call. browse. Launch browser with serving landing page?
Read more >
R API — MLflow 0.7.0 documentation
“Fresh” here means that they should be declared in the call to crate() . ... NULL, or a function to call for every...
Read more >
MLflow Tracking — MLflow 2.0.1 documentation
If you do not specify an experiment in mlflow.start_run() , new runs are ... You do not need to call start_run explicitly: calling...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found