question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] MLFLOW_S3_ENDPOINT_URL is ignored

See original GitHub issue

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

mlflow, version 1.29.0

System information

  • Arch Linux
  • Python 3.10.8

Describe the problem

I’m trying to start the mlflow tracking server with an bucket and a postgresql database attached. The object store is not hosted by AWS but implements the S3 api interface.

When adding an artifact to the database mflow throws an error that it cannot connect to the bucket. It complains about not being able to connect to AWS.

I have set all the nesessary environment variables.

Tracking information

No notebook

Code to reproduce issue

MLFLOW_TRACKING_URI=http://0.0.0.0:5000
MLFLOW_BACKEND_STORE_URI=postgresql://<user>:<password>@<host>:<port>
MLFLOW_S3_ENDPOINT_URL=https://<edacted>.<redacted>.cloud
AWS_ACCESS_KEY_ID=<redacted>
AWS_SECRET_ACCESS_KEY=<redacted>
mlflow server --default-artifact-root s3://mlflow/ --host 0.0.0.0

Stack trace

Traceback (most recent call last):
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/httpsession.py", line 455, in send
    urllib_response = conn.urlopen(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/util/retry.py", line 525, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connection.py", line 358, in connect
    self.sock = conn = self._new_conn()
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f9e5b7b9e70>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/mlflow/server/handlers.py", line 456, in wrapper
    return func(*args, **kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/mlflow/server/handlers.py", line 526, in wrapper
    return func(*args, **kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/mlflow/server/handlers.py", line 909, in _list_artifacts
    artifact_entities = _get_artifact_repo(run).list_artifacts(path)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 159, in list_artifacts
    for result in results:
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/client.py", line 514, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/client.py", line 921, in _make_api_call
    http, parsed_response = self._make_request(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/client.py", line 944, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/endpoint.py", line 202, in _send_request
    while self._needs_retry(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/endpoint.py", line 354, in _needs_retry
    responses = self._event_emitter.emit(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/retryhandler.py", line 207, in __call__
    if self._checker(**checker_kwargs):
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/retryhandler.py", line 284, in __call__
    should_retry = self._should_retry(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/retryhandler.py", line 320, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/retryhandler.py", line 363, in __call__
    checker_response = checker(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/retryhandler.py", line 247, in __call__
    return self._check_caught_exception(
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/retryhandler.py", line 416, in _check_caught_exception
    raise caught_exception
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/endpoint.py", line 281, in _do_get_response
    http_response = self._send(request)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/endpoint.py", line 377, in _send
    return self.http_session.send(request)
  File "$HOME/.local/share/virtualenvs/ai-panoptes-O4NUaGdJ/lib/python3.10/site-packages/botocore/httpsession.py", line 484, in send
    raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://s3.nl-ams.amazonaws.com/mlflow?list-type=2&prefix=1%2F6c3af048e4a74ded9dd43ddd898c0d3c%2Fartifacts%2F&delimiter=%2F&encoding-type=url"

Other info / logs

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:19 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
zingbertcommented, Oct 25, 2022

Don’t know if this could help, I’ve had a similar issue with remote tracking (with mlflow==1.29.0 and mlflow==1.30.0) of artifacts on Digital Ocean Spaces compatible S3 and resolved with this harupy comment, before I wasn’t setting MLFLOW_S3_ENDPOINT_URL=https://<region-name>.digitaloceanspaces.com (https://github.com/mlflow/mlflow/issues/5439) also on the client side (only S3 credentials) and I was getting an error like:

Could not connect to the endpoint URL: "https://<bucket-name>.s3.<region-name>.amazonaws.com/5/190cbf0a10734f308d070059f0dd8698/artifacts/model/model.pkl"

from a mlflow.sklearn.log_model on a simple jupyter lab on the mlflow sklearn_logistic_regression example, so there was clearly a malformed url instead of the correct one https://<bucket-name>.<region-name>.digitaloceanspaces.com.

IMHO this issue clashes with the official documentation when a a warning box says:

… For example, providing --default-artifact-root $MLFLOW_S3_ENDPOINT_URL on the server side and MLFLOW_S3_ENDPOINT_URL on the client side will create a client path resolution issue for the artifact storage location

continuing with

… To prevent path parsing issues, ensure that reserved environment variables are removed (unset) from client environments.

Actually the opposite is true, so now if this is the intended way to make S3 compatbile storage working for remote artifact storage maybe the official documentation has to be (temporarily? if this is a bug) updated.

0reactions
harupycommented, Oct 26, 2022

Fixed by the following changes:

  1. Fix the bucket name and endpoint URL.
  2. Remove MLFLOW_DEFAULT_ARTIFACT_ROOT in the client environment.

Todos:

  • We should improve the docs to clarify which environment variables should be (and should not be) set in each scenario.
Read more comments on GitHub >

github_iconTop Results From Across the Web

MLflow artifacts on S3 but not in UI - Stack Overflow
I solved this problem; the MLFlow server had the wrong artifact location in my case. This connection pointed to a non-existent address. –...
Read more >
Configuring a Data Science Workbench - Emily F. Gorcenski
Starting here, I can configure a toolset. First, I'll want an experiment and asset tracking solution. I'll need a visualization and ...
Read more >
Ops … I did it again – MLOps with Kubeflow, MLflow - LinkedIn
In this article we will use Kubeflow and MLflow to build the isolated workspace and MLOps pipelines for analytical teams. Currently we use ......
Read more >
mlflow Changelog - pyup.io
Fixed a bug in S3 artifact logging functionality where `MLFLOW_S3_ENDPOINT_URL` was ignored (2629, poppash) - Fixed a bug where Sqlite in-memory was not ......
Read more >
MLflow Tracking — MLflow 2.0.1 documentation
To store artifacts in a custom endpoint, set the MLFLOW_S3_ENDPOINT_URL to your endpoint's URL. For example, if you are using Digital Ocean Spaces:....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found