question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Allow argument for 'local_dst_path' when loading Pyfunc Models

See original GitHub issue

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the MLflow community.
  • No. I cannot contribute this feature at this time.

Proposal Summary

Currently, the ‘mlflow.pyfunc.load_model(MODEL_URI)’ just accepts the remote model URI (S3 in our case) when trying to load a Python flavored MLFlow model and its artifacts. As part of this call, the load_model method, downloads the artifacts registered when logging the MLFLow model to a temporary directory in the local filesystem for serving. I would like to open a feature request, to allow specifying a local path when calling the load_model function, this would enable the users to download the artifacts and the model to a specific location for further analysis.

Motivation

  • What is the use case for this feature? - Enables downloading the remote model and its artifacts to a specified location which caters to reduced model loading times as the model is directly loaded from a local file path and can be reused by other programs, if required.
  • Why is this use case valuable to support for MLflow users in general? - The feature will reduce the overall model serving time for large models as the artifacts and the model itself would be available at a local filesystem path for multiple programs to use.
  • Why is this use case valuable to support for your project(s) or organization? - This will help us to tackle the long loading times when iniializing the model serving framework as our production deployment consists of ensemble of models rather than a single model.
  • Why is it currently difficult to achieve this use case? - We have a workaround in place to shutil the models and their artifacts from the temporary directories to a different location everytime it is initialized by a program, this is not optimized and can be error prone.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Languages

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Details

import mlflow

LOGGED_MODEL_S3_URI : "s3://buckeet/path/to/the/model/artifacts"

# Current API Call
model = mlflow.pyfunc.load_model(LOGGED_MODEL_S3_URI)

# Proposed API Call
model = mlflow.pyfunc.load_model(
                    LOGGED_MODEL_S3_URI, 
                    local_dst_path=os.path.join(os.path.expanduser("~"), "model")
               )

Proposed solution

Changes in pyfunc.init.py


# Current load_model implementation

def load_model(model_uri: str, suppress_warnings: bool = True) -> PyFuncModel:
    """
    Load a model stored in Python function format.

    :param model_uri: The location, in URI format, of the MLflow model. For example:

                      - ``/Users/me/path/to/local/model``
                      - ``relative/path/to/local/model``
                      - ``s3://my_bucket/path/to/model``
                      - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
                      - ``models:/<model_name>/<model_version>``
                      - ``models:/<model_name>/<stage>``

                      For more information about supported URI schemes, see
                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/concepts.html#
                      artifact-locations>`_.
    :param suppress_warnings: If ``True``, non-fatal warning messages associated with the model
                              loading process will be suppressed. If ``False``, these warning
                              messages will be emitted.
    """
    local_path = _download_artifact_from_uri(artifact_uri=model_uri)
    model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))

    conf = model_meta.flavors.get(FLAVOR_NAME)
    if conf is None:
        raise MlflowException(
            'Model does not have the "{flavor_name}" flavor'.format(flavor_name=FLAVOR_NAME),
            RESOURCE_DOES_NOT_EXIST,
        )
    model_py_version = conf.get(PY_VERSION)
    if not suppress_warnings:
        _warn_potentially_incompatible_py_version_if_necessary(model_py_version=model_py_version)
    if CODE in conf and conf[CODE]:
        code_path = os.path.join(local_path, conf[CODE])
        mlflow.pyfunc.utils._add_code_to_system_path(code_path=code_path)
    data_path = os.path.join(local_path, conf[DATA]) if (DATA in conf) else local_path
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
    return PyFuncModel(model_meta=model_meta, model_impl=model_impl)


# Proposed load_model implementation

def load_model(model_uri: str, local_dst_path: str = None, suppress_warnings: bool = True) -> PyFuncModel:
    """
    Load a model stored in Python function format.

    :param model_uri: The location, in URI format, of the MLflow model. For example:

                      - ``/Users/me/path/to/local/model``
                      - ``relative/path/to/local/model``
                      - ``s3://my_bucket/path/to/model``
                      - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
                      - ``models:/<model_name>/<model_version>``
                      - ``models:/<model_name>/<stage>``

                      For more information about supported URI schemes, see
                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/concepts.html#
                      artifact-locations>`_.
    :param local_dst_path: The local file system path to download model and its artifacts. 
                              Defaults to `None`.
    :param suppress_warnings: If ``True``, non-fatal warning messages associated with the model
                              loading process will be suppressed. If ``False``, these warning
                              messages will be emitted.
    """
    local_path = _download_artifact_from_uri(artifact_uri=model_uri, output_path=local_dst_path)
    model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))

    conf = model_meta.flavors.get(FLAVOR_NAME)
    if conf is None:
        raise MlflowException(
            'Model does not have the "{flavor_name}" flavor'.format(flavor_name=FLAVOR_NAME),
            RESOURCE_DOES_NOT_EXIST,
        )
    model_py_version = conf.get(PY_VERSION)
    if not suppress_warnings:
        _warn_potentially_incompatible_py_version_if_necessary(model_py_version=model_py_version)
    if CODE in conf and conf[CODE]:
        code_path = os.path.join(local_path, conf[CODE])
        mlflow.pyfunc.utils._add_code_to_system_path(code_path=code_path)
    data_path = os.path.join(local_path, conf[DATA]) if (DATA in conf) else local_path
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
    return PyFuncModel(model_meta=model_meta, model_impl=model_impl)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:14 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ameya-parabcommented, Apr 2, 2021

@dmatrix, I was planning on working on the issue and would create a PR in a couple of weeks. Thanks!

1reaction
ankh6commented, Mar 30, 2021

Hey @dmatrix I’d like to make my first contribution if the issue is free If it is the case, I’ll work on it 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

mlflow.pyfunc — MLflow 2.0.1 documentation
Python function models are loaded as an instance of PyFuncModel , which is an ... The path argument is specified by the data...
Read more >
MLflow Custom Pyfunc for Saving and Loading Model - Medium
Construct and return a pyfunc-compatible model wrapper​​ Given a set of artifact URIs, save_model() and log_model() can automatically download ...
Read more >
python - mlflow.pyfunc.load_model / mlflow.pyfunc.save_model
The solution is to pass all the model_input data including the artefacts into the model as one argument. This now correctly calls and ......
Read more >
MLflow Model Registry example | Databricks on AWS
Load dataset, train model, and track with MLflow Tracking ... The MLflow Model Registry allows multiple model versions to share the same ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found