question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]: Autologging doesn't work with `tensorflow.data.Dataset` objects

See original GitHub issue

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

1.26.1

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.15.7
  • Python version: 3.9.10
  • yarn version, if running the dev UI: -

Describe the problem

When using tensorflow.data.Dataset objects, mlflow.tensorflow.autolog() throws a warning and metrics are not logged (see example 1 below). Without using tensorflow.data.Dataset objects, there is no warning and logging works fine (see example 2 below). Both model runs show up in the mlflow UI, but one has metrics the other doesn’t.

There is no Python stack trace, just a warning in the terminal:

WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: 'PrefetchDataset' object has no attribute '_batch_size'

Tracking information

MLflow version: 1.26.1 Tracking URI: sqlite:////Users/…/data/mlflow/tracking.db Artifact URI: ./mlruns/1/33fb057bcdd249d8ada7cdb64047e963/artifacts

Code to reproduce issue

Example 1: Throws a warning, metrics are not logged.

import tensorflow as tf
import mlflow
import numpy as np

mlflow.set_experiment("bug")
mlflow.start_run(run_name="error")

X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 10, 100)

train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_ds = train_ds.batch(50)
train_ds = train_ds.cache().prefetch(buffer_size=tf.data.AUTOTUNE)

model = tf.keras.Sequential()
model.add(tf.keras.Input(100, ))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              optimizer='Adam',
              metrics=['accuracy'])

mlflow.tensorflow.autolog()
history = model.fit(train_ds, epochs=100)
mlflow.end_run()

Example 2: Works fine, metrics are logged (without tf.data.Dataset objects)

import tensorflow as tf
import mlflow
import numpy as np

mlflow.set_experiment("bug")
mlflow.start_run(run_name="working")

X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 10, 100)

model = tf.keras.Sequential()
model.add(tf.keras.Input(100, ))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              optimizer='Adam',
              metrics=['accuracy'])

mlflow.tensorflow.autolog()
history = model.fit(X_train, y_train, epochs=100)
mlflow.end_run()

Other info / logs

2022/06/07 10:47:22 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: 'PrefetchDataset' object has no attribute '_batch_size'

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
redfrexxcommented, Jun 14, 2022

Great, thanks a lot!

1reaction
dbczumarcommented, Jun 14, 2022

@redfrexx @jim-brown Thank you for reporting this issue and providing a minimal, reproducible example. I’ve filed https://github.com/mlflow/mlflow/pull/6061 to fix the issue. Once it is merged, installing MLflow from master via pip install git+https://github.com/mlflow/mlflow@master should resolve the issue. This fix will also be included in the next MLflow release.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.data.FixedLengthRecordDataset | TensorFlow v2.11.0
The tf.data.FixedLengthRecordDataset reads fixed length records from binary files and creates a dataset where each record becomes an element of the dataset.
Read more >
MLflow 2.0.1 documentation
If no active run exists, a new MLflow run is created for logging these metrics and artifacts. Note that no metrics/artifacts are logged...
Read more >
Databricks Autologging
Tracking information is automatically captured and displayed in the Experiment Runs sidebar and in the MLflow UI. Autologging example. In this ...
Read more >
Log metrics, parameters and files with MLflow - Microsoft Learn
png is the name of the artifact that will be generated inside of the run. It doesn't have to be an existing file....
Read more >
Building a data pipeline - CS230 Deep Learning
In this tutorial we will learn how to use TensorFlow's Dataset module ... Origin github issue for Datasets: a bit of history on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found