question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Record model signatures and input examples for TensorFlow & PySpark ML autologging integrations

See original GitHub issue

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’re seeking help with the implementation of roadmap items tagged with the help wanted label.

For requirements clarifications and implementation questions, or to request a PR review, please tag @BenWilson2 in your communications related to this issue.

Proposal Summary

Include model signature and input example information with MLflow Models that are logged during autologging. This functionality is currently only present in a few autologging integrations: mlflow.xgboost.autolog(), mlflow.lightgbm.autolog(), and mlflow.sklearn.autolog(), but there are many other integrations listed here - https://mlflow.org/docs/latest/tracking.html#automatic-logging - that do not have this functionality. Most prominently, we should add support for TensorFlow and PySpark ML.

Motivation

  • What is the use case for this feature? Model signatures and input examples make it easier to incorporate ML models into inference workflows.
  • Why is this use case valuable to support for MLflow users in general? ^
  • Why is this use case valuable to support for your project(s) or organization? ^
  • Why is it currently difficult to achieve this use case? Users must currently compute signatures / input examples manually and invoke mlflow.*.log_model() to record this information with a persisted MLflow Model. This complicates the autologging experience.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Languages

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16

github_iconTop GitHub Comments

1reaction
BenWilson2commented, Oct 19, 2021

Hi @bipinKrishnan have you submitted your PR yet? Even if the test isn’t passing, feel free to push your branch so that we can provide some guidance / another set of eyes on that failing test.

1reaction
bali0019commented, Oct 5, 2021

@BenWilson2 After my chat with @dbczumar, I will be taking on pyspark.ml autologging integration…fyi.

Read more comments on GitHub >

github_iconTop Results From Across the Web

mlflow.tensorflow — MLflow 2.0.1 documentation
The mlflow.tensorflow module provides an API for logging and loading TensorFlow models. This module exports TensorFlow models with the following flavors:.
Read more >
Signatures in TensorFlow Lite
The input/output specifications are called "signatures". Signatures can be specified when building a SavedModel or creating concrete functions.
Read more >
Log, load, register, and deploy MLflow models
Deploy models for online serving. Log and load models. With Databricks Runtime 8.4 ML and above, when you log a model, MLflow automatically...
Read more >
You Can Blend Apache Spark And Tensorflow To Build ...
An example of a deep learning machine learning (ML) technique is artificial neural networks. They take a complex input, such as an image...
Read more >
Extracting, transforming and selecting features - Apache Spark
Assume that we have a DataFrame with 4 input columns real , bool , stringNum , and string . These different data types...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found