Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Make the AzureML entry script accept more data types

See original GitHub issue

Thank you for submitting a feature request. Before proceeding, please review MLflow’s Issue Policy for feature requests and the MLflow Contributing Guide.

Please fill in this feature request template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

Yes. I can contribute this feature independently.
Yes. I would be willing to contribute this feature with guidance from the MLflow community.
No. I cannot contribute this feature at this time.

Proposal Summary

Currently, the required entry script for deployment of models onto the azureml ecosystem from mlflow is hardwired as a text string, and configurable using the mlflow.pyfunc functions for each model flavour. However, the current function to decoding JSON input data only supports Schema types associated with Pandas Dataframe tabular data, and does not for example, support the use TFServing type tensor structures which are important for image based applications.

The current string which is hard coded into the mlflow.azureml.__init__.py file is as follows:

import pandas as pd
from azureml.core.model import Model
from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import parse_json_input, _get_jsonable_obj

def init():
    global model
    model_path = Model.get_model_path(model_name="{model_name}", version={model_version})
    model = load_model(model_path)

def run(json_input):
    input_df = parse_json_input(json_input=json_input, orient="split")
    return _get_jsonable_obj(model.predict(input_df), pandas_orient="records")

The parse_json_input function, which comes form the module mlflow.pyfunc.scoring_server, has only the ability to handle Pandas Dataframe tabular data, and thus does not convert input JSON content to tensor (numpy ndarray) type objects. It doesn’t support the TFServing tensor format, which would allow for decoding of these types of structures.

Motivation

What is the use case for this feature?

A simple extension of a use case here is the ability to tensor type inputs as defined by the Tensorspec Schema type, and support the deserialization of numpy ndarrays, such as images. This opens up the potential for any type of application which uses ndarray objects as inputs, such as CNN applications.

Why is this use case valuable to support for MLflow users in general?

It broadens the scope of the types of models that can be deployed on AzureML using the model agnostic infrastructure developed through the mlflow.pyfunc flavours.

Why is this use case valuable to support for your project(s) or organization?

Potential for many different types of models, not just Pandas dataframe tabular based data structures as input.

Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient)

The current implementation of the entry script is defined as a hard coded string in the mlflow.azureml module and cannot be overwritten or modified by any functional call. Therefore, the limitations of what data types can be deserialized by the JSON deserialization function used cannot be circumvented without changes to the said the module.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Languages

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Details

(Use this section to include any additional information about the feature. If you have a proposal for how to implement this feature, please include it here. For implementation guidelines, please refer to the Contributing Guide.)

One potential solution to the problem is to use existing functionality that has been developed for other deployment types in the mlflow package. A number of JSON deserialization functions are contained within the mlflow.pyfunc.scoring_serving module which do have the ability to decode TFServing type tensor formats, and convert these to numpy ndarrays from the specified mlflow.types.Schema.

If an input of a numpy ndarray representing an image (e.g. shape of image is (3,800,600) array) is encapsulated and serialized in the following way (using the TFServing definintion):

payload = {
            'instances' : [
                image.tolist()
                ]
        }
payload = str.encode(json.dumps(payload)))

Then using the mlflow.pyfunc.scoring_server module function infer_and_parse_json_input() in the entry script does result in successful decoding of the above JSON serialized numpy ndarray back to the correct sized object, which can then be passed to loaded model function for inference.

A notional modification to the entry script could potentially be as simple as:

import pandas as pd
from azureml.core.model import Model
from mlflow.pyfunc import load_model
from mlflow.pyfunc.scoring_server import infer_and_parse_json_input, _get_jsonable_obj

def init():
    global model
    model_path = Model.get_model_path(model_name="{model_name}", version={model_version})
    model = load_model(model_path)

def run(json_input):
    input = infer_and_parse_json_input(json_input=json_input, orient="split")
    return _get_jsonable_obj(model.predict(input), pandas_orient="records")

Local testing of this functionality appears to be successful in decoding the JSON serialized image back to a numpy ndarray and was successfully accepted by a PyTorch ONNX model that takes numpy ndarray image representations as input.

There are a few issues to iron out, including the additional dimension returned by the infer_and_parse_json_input() function, that is a shape of (1,3,nx,ny) instead of the transmitted (3,nx,ny). A simple numpy.squeeze() application solves this issue, however, input may not be only be a numpy ndarray so some digging into the infer_and_parse_json_input() function is required to see why.

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

BenWilson2commented, Feb 3, 2022

Hi @ecm200 we’re getting a few people to take a look at the design implications of this and hope to have an evaluation on the feasibility of this in the next few sprints. Thank you for the idea and I’ll keep you posted on what the team comes back with!

0reactions

santiagxfcommented, Jul 26, 2022

HI @ecm200! Just a quick update on this thread, the scenario you mentioned is currently supported in the last version of our integration with MLflow. Actually, in June we introduced a lot of improvements to the integration, along with brand new documentation and samples.

To know more about the recent updates: MLflow: A way to do more on Azure Machine Learning - AI - Machine Learning Blog
For deployments: How to deploy models with MLflow - Azure Docs
About how Azure ML and MLflow works: MLflow and Azure Machine Learning - Azure Docs

Top Results From Across the Web

Advanced entry script authoring - Azure - Microsoft Learn

Learn how to write Azure Machine Learning entry scripts for pre- and post-processing during deployment.

Deploy and Serve Model from Azure Databricks onto Azure ...

We demonstrate how to deploy a PySpark based Multi-class classification model trained on Azure Databricks using Azure Machine Learning (AML) onto Azure ...

Data | Azure Machine Learning

Connect to, or create, a datastore backed by one of the multiple data-storage options that Azure provides. For example: Azure Blob Container; Azure...

3 Ways to Pass Data Between Azure ML Pipeline Steps

Even if both steps use that dataset as an input, they're bound to the same specific version. If one step updates the dataset,...

ML Pipelines in Azure Machine Learning the right way - Medium

Azure ML Studio (AML) is an Azure service for data scientists to build, ... at all the possible input types a PythonScriptStep can...