Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] MLflow quickstart reporting failure

See original GitHub issue

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro Linux 5.16.14-1-MANJARO
MLflow installed from (source or binary): pip install mlflow
MLflow version (run mlflow --version): 1.24.0
Python version: Python 3.9.12
npm version, if running the dev UI: N/A
Exact command to reproduce: mlflow run mlflow/examples/pytorch/ --no-conda --storage-dir $(pwd)/mlruns --experiment-name Test-Exp

Describe the problem

Describe the problem clearly here. Include descriptions of the expected behavior and the actual behavior.

I’m finding that the pytorch mnist example is ending with a failure, for no known reason. I’d love to get more debug output, but don’t have an obvious way to do this.

$ mlflow run mlflow/examples/pytorch/ --no-conda --storage-dir $(pwd)/mlruns --experiment-name Test-Exp
2022/04/08 12:19:38 INFO mlflow.projects.utils: === Created directory [removed]/mlruns/tmprbb_ti6y for downloading remote URIs passed to arguments of type 'path' ===
2022/04/08 12:19:38 INFO mlflow.projects.backend.local: === Running command 'python mnist_tensorboard_artifact.py \
  --batch-size 64 \
  --test-batch-size 1000 \
  --epochs 10 \
  --lr 0.01 \
  --momentum 0.5 \
  --enable-cuda True \
  --seed 5 \
  --log-interval 100
' in run with ID 'ccdf036cada144fe9c91d48c15aa26d8' === 
...
Uploading TensorBoard events as a run artifact...
...
Sample predictions
Sample 0 : Ground truth is "3", model prediction is "5"
Sample 1 : Ground truth is "4", model prediction is "4"
Sample 2 : Ground truth is "3", model prediction is "3"
Sample 3 : Ground truth is "1", model prediction is "1"
Sample 4 : Ground truth is "2", model prediction is "2"
2022/04/08 12:22:12 ERROR mlflow.cli: === Run (ID 'ccdf036cada144fe9c91d48c15aa26d8') failed ===

Code to reproduce issue

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Server: mlflow server --backend-store-uri sqlite:///mlflow.sqlite --default-artifact-root $(pwd)/mlruns --host 0.0.0.0

Runner: mlflow run mlflow/examples/pytorch/ --no-conda --storage-dir $(pwd)/mlruns --experiment-name Test-Exp

Code at mlflow…mnist_tensorboard_artifact.py gives me this error, and the Tracking UI also shows a failure. Artifacts are logging correctly once the directories are correctly pointed.

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

NA – request what is needed

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

harupycommented, Apr 10, 2022

Hi @jeinstei, can you try directly running python mnist_tensorboard_artifact.py? This might give us more detailed error logs?

0reactions

jeinsteicommented, Aug 12, 2022

Hi folks, I’m going to go ahead and close this issue because it does not seem to stem from MLflow code. Feel free to reopen it or post on the MLflow community Slack if the issue can, in fact, be traced to MLflow code.

Will do – I’ll see if it comes up again once we spin up again

Top Results From Across the Web

Source code for mlflow.projects

The ``mlflow.projects`` module provides an API for running MLflow projects locally or remotely. """ import json import yaml import os import logging import ......

PERMISSION_DENIED error when accessing MLflow ...

The error suggests that you do not have permission to access artifacts of the experiment.

Unable to access to mlflow ui - Stack Overflow

It seems that if I execute it on the path where mlrun exists, this message pops out when accessing to mlflow entrypoint url....

ML Quickstart: Model Training (GCP) - Databricks

Part 1: Training a simple classification model with MLflow tracking ... To get error messages for failed trail runs, fully expand "Spark Jobs"...

mlflow Changelog - pyup.io

[UI] Remove the browser error notification when failing to fetch ... [UI] Fix an alignment bug affecting the Experiments list in the MLflow...