How to make mlflow tracking useable on shared file systems?
See original GitHub issueHello, first time user here.
I am failing to use the mlflow ui because it is just awfully slow for even very low numbers of runs:
[CRITICAL] WORKER TIMEOUT
What I tried:
- Changed file store to shared file system for parallel training runs (using
mlflow.set_tracking_uri()
), but recognized that mlflow by default is awfully slow this way. Simply runningmlflow.search_runs()
for 120 runs and 9 metrics takes 30s. - Tried to change to sqlite URI on shared filesystem, but this causes artifacts to land in the local folder again (why??).
- Try to find
mlflow.set_artifact_uri()
but cannot find one.
TL;DR:
How do I make mlflow work with a shared filesystem?
How can I store the artifacts in the same folder as the mlflow.db
file from sqlite?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (1 by maintainers)
Top Results From Across the Web
MLflow Tracking — MLflow 2.0.1 documentation
MLflow runs can be recorded to local files, to a SQLAlchemy compatible database, or remotely to a tracking server. By default, the MLflow...
Read more >Access the MLflow tracking server from outside Databricks
You may wish to log to the MLflow tracking server from your own applications or from the MLflow CLI. This article describes the...
Read more >How to manage your machine learning pipeline with MLflow
Although you are able to track your parameters without running a server the recommended approach is to create a MLflow tracking server. This ......
Read more >Track ML experiments and models with MLflow - Microsoft Learn
Set up tracking environment · Login into your workspace using the MLClient . The easier way to do that is by using the...
Read more >MLflow: An Open Platform to Simplify the Machine Learning ...
MLflow provides APIs for tracking experiment runs between ... HDFS, Google cloud storage, the Databricks' file system, FTP and SFTP.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for your answers 😃 This issue somehow slipped my attention, sorry for that…
So what I wanted to do with mlflow was basically serverless offline tracking with SQLite3 DB:
mlflow.db
This failed because there is no possibility to specify the
artifact_uri
inside a python script.Just in case it’s useful (I’m just another user),
[CRITICAL] WORKER TIMEOUT
=> I was getting this at the start, when my mlruns was on a shared nfs file system. I changed to use postgres on a local file system, just on the server, and then now my mlflow ui just zips along smoothly 😃 The artifacts are stored by the client, not by the server, so a shared file system might work well for them. I’m pretty sure the ui is slow because of the tracking db ,ie mlruns, not because of the artifacts.