question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] mlflow run local filedirectory is broken in v1.27.0

See original GitHub issue

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

1.27.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS Monterey 12.3.1
  • Python version: 3.7
  • yarn version, if running the dev UI: N/A

Describe the problem

The following command, which ran without issues in v1.26.1 fails in v1.27.0

mlflow run yolov5 --experiment-id <experiment-id> -b <backend> --backend-config yolov5/backend-spec.json

The log trace suggests that it assumes yolov5 is a git repository link.

I checked the codebase: mlflow.projects.utils

def _is_git_repo(path):
    """Returns True if passed-in path is a valid git repository"""
    import git

    try:
        git.Repo(path)
        return True
    except git.exc.InvalidGitRepositoryError:
        return False


def _is_local_uri(uri):
    """Returns True if passed-in URI should be interpreted as a path on the local filesystem."""
    if _GIT_URI_REGEX.match(uri):
        return False

    parsed_uri = urllib.parse.urlparse(uri)
    drive = pathlib.Path(uri).drive

    if drive != "" and drive.lower()[0] == parsed_uri.scheme:
        return not _is_git_repo(uri)
    elif parsed_uri.scheme in ("file", ""):
        return not _is_git_repo(parsed_uri.path)
    else:
        return False

Here are some quick checks:

>>> from pathlib import Path
>>> drive = Path("yolov5").drive
>>> drive
''
>>> import urllib
>>> urllib.parse.urlparse("yolov5")
ParseResult(scheme='', netloc='', path='yolov5', params='', query='', fragment='')
>>> import git
>>> git.Repo("yolov5")
<git.repo.base.Repo '/Users/cjagadeesan/workspace/python/yolov5/.git'>

It looks like both parsed_uri.scheme in ("file", "") and _is_git_repo(parsed_uri.path) are evaluated as True and therefore False is returned. (elif condition in _is_local_uri(uri))

Unsure about the purpose of checking this way: Path(uri).exists() would evaluate to True for local directories in both unix based filesytems and windows.

If git init has been run within the folder, this bug gets triggered because git.Repo(uri) runs without issue

mlflow run sample-mlflow -b local --experiment-name "sample-mlflow"

2022/07/05 08:53:39 INFO mlflow.utils.conda: Conda environment mlflow-da39a3ee5e6b4b0d3255bfef95601890afd80709 already exists.
2022/07/05 08:53:39 INFO mlflow.projects.utils: === Created directory /var/folders/yv/k41cfcdj5l12xbyjqwv7tz_80000gr/T/tmph5qx0jfd for downloading remote URIs passed to arguments of type 'path' ===
2022/07/05 08:53:39 INFO mlflow.projects.backend.local: === Running command 'source /opt/miniconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-da39a3ee5e6b4b0d3255bfef95601890afd80709 1>&2 && scripts/run.sh' in run with ID '5fb3639ad60f4abd8a96ccd7252d34b0' ===
Hey this ran!
2022/07/05 08:53:39 INFO mlflow.projects: === Run (ID '5fb3639ad60f4abd8a96ccd7252d34b0') succeeded ===

cd sample-mlflow
git init .

Initialized empty Git repository in /Users/cjagadeesan/workspace/python/sample-mlflow/.git/

cd ..
mlflow run sample-mlflow -b local --experiment-name "sample-mlflow"

2022/07/05 08:54:04 INFO mlflow.projects.utils: === Fetching project from sample-mlflow into /var/folders/yv/k41cfcdj5l12xbyjqwv7tz_80000gr/T/tmpj9o_m95d ===
Traceback (most recent call last):
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/cli.py", line 195, in run
    run_name=run_name,
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 343, in run
    run_name=run_name,
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/projects/__init__.py", line 102, in _run
    experiment_id,
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/projects/backend/local.py", line 64, in run
    work_dir = fetch_and_validate_project(project_uri, version, entry_point, params)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/projects/utils.py", line 137, in fetch_and_validate_project
    work_dir = _fetch_project(uri=uri, version=version)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/projects/utils.py", line 170, in _fetch_project
    _fetch_git_repo(parsed_uri, version, dst_dir)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/mlflow/projects/utils.py", line 220, in _fetch_git_repo
    output = g.execute(cmd)
  File "/Users/cjagadeesan/.pyenv/versions/3.7.10/lib/python3.7/site-packages/git/cmd.py", line 983, in execute
    raise GitCommandError(redacted_command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git remote show origin
  stderr: 'fatal: 'sample-mlflow' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.'

Tracking information

MLflow version: 1.27.0
Tracking URI: databricks
Artifact URI: databricks

Code to reproduce issue

import mlflow
mlflow.projects.run(uri="nmt-char", backend="local", experiment_id="2315913529922077")

For your testing purposes, clone a repo locally. git.Repo(uri) should not throw an error.

Other info / logs

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
harupycommented, Jul 7, 2022

@ElefHead Thanks! I’ll take a look 😃

0reactions
ElefHeadcommented, Jul 7, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix Artifacts not showing in MLflow UI - Stack Overflow
Is this code not being run locally? Are you moving the mlruns folder perhaps? I'd suggest checking the artifact URI present in the...
Read more >
log_artifact not working - Google Groups
Hey folks,. I got the mlflow ui server set up, and its saving everything except log_artifact. The server is running mlflow in a...
Read more >
MLflow 2.0.1 documentation
If no active run exists, a new MLflow run is created for logging these metrics and artifacts. Note that no metrics/artifacts are logged...
Read more >
Log metrics, parameters and files with MLflow - Microsoft Learn
Enable logging on your ML training runs to monitor real-time run metrics with MLflow, and to help diagnose errors and warnings.
Read more >
mlflow - PyPI
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found