Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] MLflow with ray starts another experiment

See original GitHub issue

Issues Policy acknowledgement

I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

1.30.0

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.1 LTS
Python version: 3.10.6
yarn version, if running the dev UI:

Describe the problem

Ray opens another mlflow experiment.

I am using pytorch_lightning together with ray for distributed training and mlflow to track my experiments.

I first start and experiment to log several things:

def _log_parameters(**kwargs):
    for key, value in kwargs.items():
        mlflow.log_param(str(key), value)

#... snip ...#
def main():
    if FLAGS.mlflow_server_uri is not None:
        mlflow.set_tracking_uri(FLAGS.mlflow_server_uri)
    mlflow.start_run()
    _log_parameters(some_parameters)

Then I start ray as follows:

            ray.init(address='auto')
            plugin = RayStrategy(num_workers=FLAGS.num_workers,
                                 num_cpus_per_worker=FLAGS.num_cpus_per_worker,
                                 use_gpu=FLAGS.use_gpu)
            trainer = pl.Trainer(max_epochs=FLAGS.max_epochs,
                                 strategy=plugin,
                                 logger=False,
                                 callbacks=all_callbacks,
                                 precision=int(FLAGS.precision))
          trainer.fit()

Then I notice that all metrics from training are being logged to mlruns instead of the experiment created by mlflow.start_run()

Tracking information

No response

Code to reproduce issue

cannot provide code to reproduce the issue at the moment .

Stack trace

There is no error log. But everything is logged somewhere else.

Other info / logs

No response

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:8

Top GitHub Comments

1reaction

BenWilson2commented, Dec 6, 2022

@MakGulati Does your remote machine have a concept of what the tracking server’s uri is? Since these are isolated processes, you’re going to have to set the uri within the remote function to get it to log to the correct location.

0reactions

MakGulaticommented, Dec 7, 2022

@MakGulati Does your remote machine have a concept of what the tracking server’s uri is? Since these are isolated processes, you’re going to have to set the uri within the remote function to get it to log to the correct location.

Thanks @BenWilson2 it worked when I set uri inside remote function i.e. task

Top Results From Across the Web

ray.tune.experiment.trial — Ray 2.2.0 - the Ray documentation

Trials start in the PENDING state, and transition to RUNNING once started. On error it transitions to ERROR, otherwise TERMINATED on success.

Log metrics, parameters and files with MLflow - Microsoft Learn

Create or set the active experiment. Start the job. Use logging methods to log metrics and other information. End the job. For example,...

ray Changelog - pyup.io

Suppress the logging error when python exits and actor not deleted (27300) ... [Train/Tune] Start MLflow run under the correct experiment for Ray...

PyCaret Tutorial: A beginner's guide for automating ML ...

Compared with the other open-source machine learning libraries, PyCaret is an ... XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more....

Neptune vs MLflow

MLflow is great for Data Scientists and ML engineers looking for a ... and experiment sharing start to bite as soon as your...