question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to make mlflow tracking useable on shared file systems?

See original GitHub issue

Hello, first time user here. I am failing to use the mlflow ui because it is just awfully slow for even very low numbers of runs: [CRITICAL] WORKER TIMEOUT

What I tried:

  1. Changed file store to shared file system for parallel training runs (using mlflow.set_tracking_uri()), but recognized that mlflow by default is awfully slow this way. Simply running mlflow.search_runs() for 120 runs and 9 metrics takes 30s.
  2. Tried to change to sqlite URI on shared filesystem, but this causes artifacts to land in the local folder again (why??).
  3. Try to find mlflow.set_artifact_uri() but cannot find one.

TL;DR: How do I make mlflow work with a shared filesystem? How can I store the artifacts in the same folder as the mlflow.db file from sqlite?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
Hoezecommented, Feb 20, 2021

Thanks for your answers 😃 This issue somehow slipped my attention, sorry for that…

So what I wanted to do with mlflow was basically serverless offline tracking with SQLite3 DB:

project_dir="/proj/myfancymodel"

# sqlite3:///proj/myfancymodel/mlflow.db
mlflow.set_tracking_uri("sqlite3://" + project_dir + "/mlflow.db") 
# file:///proj/myfancymodel/artifacts/
mlflow.set_artifact_uri("file://" + project_dir + "/artifacts/")
  • sqlite3 should be more than fast enough to handle < 50,000 rows
  • artifacts + database are stored at the same location
  • no fancy database connection and user authentication setup
  • simple file access permission is enough
  • database concurrency is handled through file locking of mlflow.db

This failed because there is no possibility to specify the artifact_uri inside a python script.

2reactions
hughperkinscommented, Feb 20, 2021

Just in case it’s useful (I’m just another user), [CRITICAL] WORKER TIMEOUT => I was getting this at the start, when my mlruns was on a shared nfs file system. I changed to use postgres on a local file system, just on the server, and then now my mlflow ui just zips along smoothly 😃 The artifacts are stored by the client, not by the server, so a shared file system might work well for them. I’m pretty sure the ui is slow because of the tracking db ,ie mlruns, not because of the artifacts.

Read more comments on GitHub >

github_iconTop Results From Across the Web

MLflow Tracking — MLflow 2.0.1 documentation
MLflow runs can be recorded to local files, to a SQLAlchemy compatible database, or remotely to a tracking server. By default, the MLflow...
Read more >
Access the MLflow tracking server from outside Databricks
You may wish to log to the MLflow tracking server from your own applications or from the MLflow CLI. This article describes the...
Read more >
How to manage your machine learning pipeline with MLflow
Although you are able to track your parameters without running a server the recommended approach is to create a MLflow tracking server. This ......
Read more >
Track ML experiments and models with MLflow - Microsoft Learn
Set up tracking environment · Login into your workspace using the MLClient . The easier way to do that is by using the...
Read more >
MLflow: An Open Platform to Simplify the Machine Learning ...
MLflow provides APIs for tracking experiment runs between ... HDFS, Google cloud storage, the Databricks' file system, FTP and SFTP.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found