Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]: Autologging doesn't work with `tensorflow.data.Dataset` objects

See original GitHub issue

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

1.26.1

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.15.7
Python version: 3.9.10
yarn version, if running the dev UI: -

Describe the problem

When using tensorflow.data.Dataset objects, mlflow.tensorflow.autolog() throws a warning and metrics are not logged (see example 1 below). Without using tensorflow.data.Dataset objects, there is no warning and logging works fine (see example 2 below). Both model runs show up in the mlflow UI, but one has metrics the other doesn’t.

There is no Python stack trace, just a warning in the terminal:

WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: 'PrefetchDataset' object has no attribute '_batch_size'

Tracking information

MLflow version: 1.26.1 Tracking URI: sqlite:////Users/…/data/mlflow/tracking.db Artifact URI: ./mlruns/1/33fb057bcdd249d8ada7cdb64047e963/artifacts

Code to reproduce issue

Example 1: Throws a warning, metrics are not logged.

import tensorflow as tf
import mlflow
import numpy as np

mlflow.set_experiment("bug")
mlflow.start_run(run_name="error")

X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 10, 100)

train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_ds = train_ds.batch(50)
train_ds = train_ds.cache().prefetch(buffer_size=tf.data.AUTOTUNE)

model = tf.keras.Sequential()
model.add(tf.keras.Input(100, ))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              optimizer='Adam',
              metrics=['accuracy'])

mlflow.tensorflow.autolog()
history = model.fit(train_ds, epochs=100)
mlflow.end_run()

Example 2: Works fine, metrics are logged (without `tf.data.Dataset` objects)

import tensorflow as tf
import mlflow
import numpy as np

mlflow.set_experiment("bug")
mlflow.start_run(run_name="working")

X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 10, 100)

model = tf.keras.Sequential()
model.add(tf.keras.Input(100, ))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              optimizer='Adam',
              metrics=['accuracy'])

mlflow.tensorflow.autolog()
history = model.fit(X_train, y_train, epochs=100)
mlflow.end_run()

Other info / logs

2022/06/07 10:47:22 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: 'PrefetchDataset' object has no attribute '_batch_size'

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:9

Top GitHub Comments

1reaction

redfrexxcommented, Jun 14, 2022

Great, thanks a lot!

1reaction

dbczumarcommented, Jun 14, 2022

@redfrexx @jim-brown Thank you for reporting this issue and providing a minimal, reproducible example. I’ve filed https://github.com/mlflow/mlflow/pull/6061 to fix the issue. Once it is merged, installing MLflow from master via pip install git+https://github.com/mlflow/mlflow@master should resolve the issue. This fix will also be included in the next MLflow release.

Top Results From Across the Web

tf.data.FixedLengthRecordDataset | TensorFlow v2.11.0

The tf.data.FixedLengthRecordDataset reads fixed length records from binary files and creates a dataset where each record becomes an element of the dataset.

MLflow 2.0.1 documentation

If no active run exists, a new MLflow run is created for logging these metrics and artifacts. Note that no metrics/artifacts are logged...

Databricks Autologging

Tracking information is automatically captured and displayed in the Experiment Runs sidebar and in the MLflow UI. Autologging example. In this ...

Log metrics, parameters and files with MLflow - Microsoft Learn

png is the name of the artifact that will be generated inside of the run. It doesn't have to be an existing file....

Building a data pipeline - CS230 Deep Learning

In this tutorial we will learn how to use TensorFlow's Dataset module ... Origin github issue for Datasets: a bit of history on...