[FR] Record model signatures and input examples for TensorFlow & PySpark ML autologging integrations

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’re seeking help with the implementation of roadmap items tagged with the help wanted label.

For requirements clarifications and implementation questions, or to request a PR review, please tag @BenWilson2 in your communications related to this issue.

Proposal Summary

Include model signature and input example information with MLflow Models that are logged during autologging. This functionality is currently only present in a few autologging integrations: mlflow.xgboost.autolog(), mlflow.lightgbm.autolog(), and mlflow.sklearn.autolog(), but there are many other integrations listed here - https://mlflow.org/docs/latest/tracking.html#automatic-logging - that do not have this functionality. Most prominently, we should add support for TensorFlow and PySpark ML.

Motivation

What is the use case for this feature? Model signatures and input examples make it easier to incorporate ML models into inference workflows.
Why is this use case valuable to support for MLflow users in general? ^
Why is this use case valuable to support for your project(s) or organization? ^
Why is it currently difficult to achieve this use case? Users must currently compute signatures / input examples manually and invoke mlflow.*.log_model() to record this information with a persisted MLflow Model. This complicates the autologging experience.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Languages

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created 2 years ago
Comments:16

Top GitHub Comments

1reaction

BenWilson2commented, Oct 19, 2021

Hi @bipinKrishnan have you submitted your PR yet? Even if the test isn’t passing, feel free to push your branch so that we can provide some guidance / another set of eyes on that failing test.

1reaction

bali0019commented, Oct 5, 2021

@BenWilson2 After my chat with @dbczumar, I will be taking on pyspark.ml autologging integration…fyi.

Top Results From Across the Web

mlflow.tensorflow — MLflow 2.0.1 documentation

The mlflow.tensorflow module provides an API for logging and loading TensorFlow models. This module exports TensorFlow models with the following flavors:.

Signatures in TensorFlow Lite

The input/output specifications are called "signatures". Signatures can be specified when building a SavedModel or creating concrete functions.

Log, load, register, and deploy MLflow models

Deploy models for online serving. Log and load models. With Databricks Runtime 8.4 ML and above, when you log a model, MLflow automatically...

You Can Blend Apache Spark And Tensorflow To Build ...

An example of a deep learning machine learning (ML) technique is artificial neural networks. They take a complex input, such as an image...

Extracting, transforming and selecting features - Apache Spark

Assume that we have a DataFrame with 4 input columns real , bool , stringNum , and string . These different data types...