[FR] Record model signatures and input examples for TensorFlow & PySpark ML autologging integrations
See original GitHub issueMLflow Roadmap Item
This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’re seeking help with the implementation of roadmap items tagged with the help wanted
label.
For requirements clarifications and implementation questions, or to request a PR review, please tag @BenWilson2 in your communications related to this issue.
Proposal Summary
Include model signature and input example information with MLflow Models that are logged during autologging. This functionality is currently only present in a few autologging integrations: mlflow.xgboost.autolog(), mlflow.lightgbm.autolog(), and mlflow.sklearn.autolog(), but there are many other integrations listed here - https://mlflow.org/docs/latest/tracking.html#automatic-logging - that do not have this functionality. Most prominently, we should add support for TensorFlow and PySpark ML.
Motivation
- What is the use case for this feature? Model signatures and input examples make it easier to incorporate ML models into inference workflows.
- Why is this use case valuable to support for MLflow users in general? ^
- Why is this use case valuable to support for your project(s) or organization? ^
- Why is it currently difficult to achieve this use case? Users must currently compute signatures / input examples manually and invoke
mlflow.*.log_model()
to record this information with a persisted MLflow Model. This complicates the autologging experience.
What component(s), interfaces, languages, and integrations does this feature affect?
Components
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
Interfaces
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
Languages
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
Integrations
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created 2 years ago
- Comments:16
Hi @bipinKrishnan have you submitted your PR yet? Even if the test isn’t passing, feel free to push your branch so that we can provide some guidance / another set of eyes on that failing test.
@BenWilson2 After my chat with @dbczumar, I will be taking on pyspark.ml autologging integration…fyi.