question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Avoid problem with length limit of 5000 in MLflow

See original GitHub issue

Motivation

When using the MLflowCallback it is possible that the user added user_attrs (like lists of float) that are longer then 5000 characters when converted to a str. This causes problems with MLflow which limits the length to 5000. See example below:

[...]
  File "/home/smay/miniconda3/envs/py38/lib/python3.8/site-packages/mlflow/utils/validation.py", line 136, in _validate_length_limit
    raise MlflowException(
mlflow.exceptions.MlflowException: Tag value '[0.8562690322984875, 0.8544098885636596, 0.8544098885636596, 0.8544098885636596, 0.8544098885636596, 0.859181214773054, 0.86273086038245, 0.86273086038245, 0.86273086038245, 0.86273086038245, 0.86273086038245, 0.8562690322984875, 0.8544098885636596, ' had length 5276, which exceeded length limit of 5000

Description

I suggest to check the strings if they are len() > 5000 and cut them if needed. If this happens a warning should be printed.

I can provide a PR if wanted. Just give me feedback to my proposal please.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
PhilipMaycommented, Jun 14, 2020

I asked the MLflow guys if they want to fix this issue on their side: https://github.com/mlflow/mlflow/issues/2931

1reaction
PhilipMaycommented, Jun 9, 2020

It’d be good to know if this number 5000 can be retrieved via some public API but I guess it’s not a must.

I will have a look into the MLflow code to retrieve the 5000 limit.

would you mind briefly explaining why you need more than 5000 characters since I suppose is beyond what’s expected by mlflow.

Sometimes I am logging a list of feature importances (SHAP values) to the user attr of optuna. Since I have 50000 features (bioinformatics) it is a list of 50000 floats. That is longer then 5000 characters when converted to a string. But there are also other less extreme cases where I want to log the individual results of a nested cross validation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

MLflow 2.0.1 documentation
max_results – The maximum number of runs to put in the dataframe. Default is 100,000 to avoid causing out-of-memory issues on the user's...
Read more >
Source code for mlflow.tracking.fluent
All backend stores will support values up to length 5000, but some may ... Default is 100,000 to avoid causing out-of-memory issues on...
Read more >
MLflow 1.7.0 documentation
max_results – The maximum number of runs to put in the dataframe. Default is 100,000 to avoid causing out-of-memory issues on the user's...
Read more >
MLflow Tracking — MLflow 2.0.1 documentation
HTTP server (specified as https://my-server:5000 ), which is a server hosting an MLflow ... To prevent path parsing issues, ensure that reserved environment ......
Read more >
MLflow 1.2.0 documentation
max_results – The maximum number of runs to put in the dataframe. Default is 100,000 to avoid causing out-of-memory issues on the user's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found