question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] No basic auth in MlflowArtifactsRepository

See original GitHub issue

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
  • No. I cannot contribute a bug fix at this time.

System information

  • Have I written custom code (as opposed to using a stock example script provided in MLflow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux
  • MLflow installed from (source or binary): binary
  • MLflow version (run mlflow --version): 1.23.0
  • Python version: 3.9
  • npm version, if running the dev UI:
  • Exact command to reproduce:

Describe the problem

We are trying to use the MLFlow server as a proxy to push artifacts to S3 using the --serve-artifacts flag. Our MLFlow server is behind a reverse proxy which requires basic auth. We use the default artifacts uri mlflow-artifacts:/, without specifying a host, which means clients will assume the host is the tracking server uri. To support basic auth for the tracking server, clients have the option to set MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD, however these variables are not included in calls to the artifact proxy (HttpArtifactRepository constructs a Session without any authentication here). There are also no dedicated variables for the artifacts proxy specifically. Providing username and password in the url directly is also not possible since these are stripped from the tracking uri when constructing the artifacts proxy uri (here).

When the tracking server is used as the artifacts proxy we would expect calls to the artifacts proxy to include the same authentication headers as specified for the tracking server.

A pragmatic fix would be something like https://github.com/mlflow/mlflow/compare/master...TimNooren:mlflow_artifact_repo_basic_auth, but maybe the implementation could rely more on what is already provided in mlflow.utils.rest_utils (more similar to mlflow.store.tracking.rest_store.RestStore). Some guidance here would be great:)

Code to reproduce issue

import os

from mlflow.store.artifact.mlflow_artifacts_repo import MlflowArtifactsRepository
from mlflow.store.tracking import DEFAULT_ARTIFACTS_URI

os.environ["MLFLOW_TRACKING_URI"] = "https://my.mlflow.server:443/"  # Using basic auth
os.environ["MLFLOW_TRACKING_USERNAME"] = "username"
os.environ["MLFLOW_TRACKING_PASSWORD"] = "password"

MlflowArtifactsRepository(DEFAULT_ARTIFACTS_URI).list_artifacts(). # Unauthorized

Other info / logs

What component(s), interfaces, languages, and integrations does this bug affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
TimNoorencommented, Feb 23, 2022

@cgebe you’re right, this PR does not address your issue:) But I believe this was fixed in https://github.com/mlflow/mlflow/pull/5385.

1reaction
dbczumarcommented, Feb 11, 2022

Thank you, @TimNooren !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Concepts — MLflow 2.0.1 documentation
MLflow provides four components to help manage the ML workflow: MLflow Tracking is an API and UI for logging parameters, code versions, metrics,...
Read more >
INFINSTOR ENTERPRISE MLFLOW - Webflow
Corporate Directory Auth for MLflow Artifacts: Open source MLflow does not provide tools to manage access to the object store (S3) underlying ...
Read more >
MLflow and DVC for open-source reproducible Machine ...
The MLflow client supports basic authentication, and in order to specify your credentials, you need to use the environment variables MLFLOW_TRACKING_USERNAME ...
Read more >
All articles - Databricks Knowledge Base
Cluster Apache Spark configuration not applied · Problem Your cluster's Spark configuration values are not applied. Cause This happens when the Spark config ......
Read more >
How to connect to MLFlow tracking server that has auth?
MLflow documentation says: MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD - username and password to use with HTTP Basic ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found