question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Using MlflowClient.get_latest_version with an older server instance causes 404

See original GitHub issue

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): SUSE Linux Enterprise Server 12 SP2
  • MLflow installed from (source or binary): Binary
  • MLflow version (run mlflow --version): 1.22 for the client, 1.11.0 for the server
  • Python version: 3.6.12
  • npm version, if running the dev UI:
  • Exact command to reproduce: MlflowClient().get_latest_versions("SOME_REGISTERED_MODEL")

Describe the problem

We have MlFlow 1.11.0 installed on a server, and version 1.22 on the client. When the client is connected to the server, and executing the get_latest_version command, the following message is shown:

MlflowException: API request to endpoint /api/2.0/mlflow/registered-models/get-latest-versions failed with error code 404 != 200. Response body: '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<title>404 Not Found</title>

<h1>Not Found</h1>

<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>

The expected result would have been the latest version information of the model.

Code to reproduce issue

Prerequisites:

  1. Run mlflow version 1.11.0 on a server in a conda environment
  2. Connect a client to the mlflow server using mlflow.set_tracking_uri()
  3. Create an experiment on the server using MlflowClient.create_experiment

Now run:

import mlflow
from mlflow.tracking import MlflowClient

mlflow.set_tracking_uri("TRACKING_URI")

client = MlflowClient()
client.get_latest_versions("EXPERIMENT_NAME")

Other info / logs

This problem seems to have been introduced with #4999. In that PR, support for POST-calls on get-latest-version was added. This by itself is not a problem, since the author of the PR checks that if the POST-call is not available on the server, it catches the ENDPOINT_NOT_FOUND exception and tries to use a GET-call.

The problem is created however, because it calls /mlflow/registered-models/get-latest-version instead of the usual /preview/mlflow/registered-models/get-latest-versions. This means that not an ENDPONT_NOT_FOUND exception is thrown, but a 404 Not Found. This exception is not caught, meaning that the program will not continue to try the GET-call and crashes instead.

It seems to me that the omission of preview in the URL is the root cause of the bug (see: https://github.com/stevenchen-db/mlflow/blob/9bbbb0c28d285476e0f3e2a81ecfbf577d1b03ca/mlflow/protos/model_registry.proto#L133), but I’m not sure if that is on purpose or not. If the omission of preview is on purpose, some other way of exception handling could be implemented to avoid getting 404-errors when working with older servers that do not support POST-calls.

Since the reason behind the missing preview part is not entirely clear to me, I did not want to provide a bug fix immediately. If the maintainers could provide some guidance on which approach to fix this bug would be best for this project, I’d be happy to help implement the change.

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
krstpcommented, Feb 2, 2022

Reporting the same issue with 1.23.1. To be more precise, it yields 405 HTTP response code: Method Not Allowed.

A solution to the issue is, both the python and mlflow versions of client and server needs to be the same.

Would be good for the future for the mismatch in versions to be allowed; this would facilitate MLFlow interaction across different projects. Otherwise it is quite troublesome to maintain all projects with different developers at the same time.

Needless to say, the versioning issue also affects the cloudpickle so either way, in current state of things, the version overlap appears is required.

1reaction
tahessecommented, Mar 31, 2022

I came across a similar issue for the `` endpoint. MLflow server version: 1.21.0 (docs: https://www.mlflow.org/docs/1.21.0/rest-api.html#get-latest-modelversions) MLflow client version: 1.24.0 (docs: https://www.mlflow.org/docs/1.24.0/rest-api.html#get-latest-modelversions)

The obvious first: the request method changed from GET to POST. Apparently, the MLflow team tries to accommodate for these exact changes here: https://github.com/mlflow/mlflow/blob/e78d6e90b0011b4ad33aa9cda84e8e0c7d202349/mlflow/utils/rest_utils.py#L265-L270 which does not work because if the first method (POST) fails, it will break out of the loop and never try to do the GET method later.

While debugging, I don’t get past the first entry and the exception is reraised: image

A current workaround is to down-/upgrade either client or server until the REST API matches. To the MLflow team (not meant to be offensive):

  • It looks like you already have API versioning in place, please increment the API when introducing breaking changes. Do not try to accommodate for different API versions in the code, it will get harder and harder to read (and frankly, the code is fairly convoluted at this point).
  • IMHO your semantic versioning is off because changing the API is a major (= breaking) change and not a minor change.
  • As a quick fix to this issue, do not throw exceptions unless the for loop is exhausted.
Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found