[FR] Improve performance by lowering amount of calls to retrieve model
See original GitHub issueThank you for submitting a feature request. Before proceeding, please review MLflow’s Issue Policy for feature requests and the MLflow Contributing Guide.
Please fill in this feature request template to ensure a timely and thorough response.
Willingness to contribute
The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?
- Yes. I can contribute this feature independently.
- Yes. I would be willing to contribute this feature with guidance from the MLflow community.
- No. I cannot contribute this feature at this time.
Proposal Summary
Retrieve models more efficiently by lowering required amount of requests.
Currently to retrieve a model we have to do 3 requests: experiment_name=“energy_forecast_10001_Amsterdam” experiment = mlflow.get_experiment_by_name(experiment_name) run = mlflow.search_runs(experiment.experiment_id, max_results=1) model = mlflow.sklearn.load_model(os.path.join(run.artifact_uri[0], “model/”))
It would be nice if this can be speeded up by getting model in only 1 request: model = mlflow.sklearn.load_latest_model(experiment_name)
or 2 requests: run = mlflow.search_runs(experiment_name, max_results=1) model = mlflow.sklearn.load_model(os.path.join(run.artifact_uri[0], “model/”))
Motivation
- What is the use case for this feature? Performance
- Why is this use case valuable to support for MLflow users in general? Performance for all users to load models.
- Why is this use case valuable to support for your project(s) or organization? Performance.
- Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient) It’s more difficult/impossible to improve the performance at higher level when lower calls are not performant.
What component(s), interfaces, languages, and integrations does this feature affect?
Components
-
area/artifacts: Artifact stores and artifact logging -
area/build: Build and test infrastructure for MLflow -
area/docs: MLflow documentation pages -
area/examples: Example code -
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models: MLmodel format, model serialization/deserialization, flavors -
area/projects: MLproject format, project running backends -
area/scoring: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra: MLflow Tracking server backend -
area/tracking: Tracking Service, tracking client APIs, autologging
Interfaces
-
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows: Windows support
Languages
-
language/r: R APIs and clients -
language/java: Java APIs and clients -
language/new: Proposals for new client languages
Integrations
-
integrations/azure: Azure and Azure ML integrations -
integrations/sagemaker: SageMaker integrations -
integrations/databricks: Databricks integrations
Details
(Use this section to include any additional information about the feature. If you have a proposal for how to implement this feature, please include it here. For implementation guidelines, please refer to the Contributing Guide.)
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)

Top Related StackOverflow Question
I’ll give this a try
This issue was resolved by this PR (https://github.com/mlflow/mlflow/pull/5564) and mlflow 1.25.0 release. Did a small test on MLFlow==1.25.0 with a SQLite database. Performance did improve! It varied quite a bit compared to before. Probably due too environment (local vs kubernetes cluster, and file-based vs SQLite) and also how many runs/models were stored.
Summary performance check model retrieval per code chunk
<html> <body>| Average over 10 calls – | – Tracking registry: model via name + experiment + run | 1.48 s Tracking registry: model via name + run | 1.46 s Model registry: model via version + model registry | 1.62 s Model registry: multiple models via stage None + model registry | 1.46 s Model registry: single model via stage Production+ model registry | 1.49 s
</body> </html>Tracking registry model retrieval
Retrieve model via name + experiment + run (1.48 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Retrieve model via name + run (1.46 s ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Model registry model retrieval
Retrieve model via version + model registry (1.62 s ± 171 ms per loop (mean ± std. dev. of 7 runs, 10 loops each))
Retrieve model via stage None + model registry (1.46 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)) Ten models are on stage None, but most recently trained model will be retrieved
Retrieve model via stage Production + model registry (1.49 s ± 66.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)) One single model is on Production
Thanks restless for implementing this. More neat to be able to get run based on experiment_name directly from tracking registry 😃