[BUG] MLflow with ray starts another experiment
See original GitHub issueIssues Policy acknowledgement
- I have read and agree to submit bug reports in accordance with the issues policy
Willingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
1.30.0
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.1 LTS
- Python version: 3.10.6
- yarn version, if running the dev UI:
Describe the problem
Ray opens another mlflow experiment.
I am using pytorch_lightning together with ray for distributed training and mlflow to track my experiments.
I first start and experiment to log several things:
def _log_parameters(**kwargs):
for key, value in kwargs.items():
mlflow.log_param(str(key), value)
#... snip ...#
def main():
if FLAGS.mlflow_server_uri is not None:
mlflow.set_tracking_uri(FLAGS.mlflow_server_uri)
mlflow.start_run()
_log_parameters(some_parameters)
Then I start ray as follows:
ray.init(address='auto')
plugin = RayStrategy(num_workers=FLAGS.num_workers,
num_cpus_per_worker=FLAGS.num_cpus_per_worker,
use_gpu=FLAGS.use_gpu)
trainer = pl.Trainer(max_epochs=FLAGS.max_epochs,
strategy=plugin,
logger=False,
callbacks=all_callbacks,
precision=int(FLAGS.precision))
trainer.fit()
Then I notice that all metrics from training are being logged to mlruns
instead of the experiment created by mlflow.start_run()
Tracking information
No response
Code to reproduce issue
cannot provide code to reproduce the issue at the moment .
Stack trace
There is no error log. But everything is logged somewhere else.
Other info / logs
No response
What component(s) does this bug affect?
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
What language(s) does this bug affect?
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created a year ago
- Comments:8
Top Results From Across the Web
ray.tune.experiment.trial — Ray 2.2.0 - the Ray documentation
Trials start in the PENDING state, and transition to RUNNING once started. On error it transitions to ERROR, otherwise TERMINATED on success.
Read more >Log metrics, parameters and files with MLflow - Microsoft Learn
Create or set the active experiment. Start the job. Use logging methods to log metrics and other information. End the job. For example,...
Read more >ray Changelog - pyup.io
Suppress the logging error when python exits and actor not deleted (27300) ... [Train/Tune] Start MLflow run under the correct experiment for Ray...
Read more >PyCaret Tutorial: A beginner's guide for automating ML ...
Compared with the other open-source machine learning libraries, PyCaret is an ... XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more....
Read more >Neptune vs MLflow
MLflow is great for Data Scientists and ML engineers looking for a ... and experiment sharing start to bite as soon as your...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@MakGulati Does your remote machine have a concept of what the tracking server’s uri is? Since these are isolated processes, you’re going to have to set the uri within the remote function to get it to log to the correct location.
Thanks @BenWilson2 it worked when I set uri inside remote function i.e.
task