[BUG]: Autologging doesn't work with `tensorflow.data.Dataset` objects
See original GitHub issueWillingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
MLflow version
1.26.1
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS 10.15.7
- Python version: 3.9.10
- yarn version, if running the dev UI: -
Describe the problem
When using tensorflow.data.Dataset
objects, mlflow.tensorflow.autolog()
throws a warning and metrics are not logged (see example 1 below). Without using tensorflow.data.Dataset
objects, there is no warning and logging works fine (see example 2 below). Both model runs show up in the mlflow UI, but one has metrics the other doesn’t.
There is no Python stack trace, just a warning in the terminal:
WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: 'PrefetchDataset' object has no attribute '_batch_size'
Tracking information
MLflow version: 1.26.1 Tracking URI: sqlite:////Users/…/data/mlflow/tracking.db Artifact URI: ./mlruns/1/33fb057bcdd249d8ada7cdb64047e963/artifacts
Code to reproduce issue
Example 1: Throws a warning, metrics are not logged.
import tensorflow as tf
import mlflow
import numpy as np
mlflow.set_experiment("bug")
mlflow.start_run(run_name="error")
X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 10, 100)
train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_ds = train_ds.batch(50)
train_ds = train_ds.cache().prefetch(buffer_size=tf.data.AUTOTUNE)
model = tf.keras.Sequential()
model.add(tf.keras.Input(100, ))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
mlflow.tensorflow.autolog()
history = model.fit(train_ds, epochs=100)
mlflow.end_run()
Example 2: Works fine, metrics are logged (without tf.data.Dataset
objects)
import tensorflow as tf
import mlflow
import numpy as np
mlflow.set_experiment("bug")
mlflow.start_run(run_name="working")
X_train = np.random.rand(100, 100)
y_train = np.random.randint(0, 10, 100)
model = tf.keras.Sequential()
model.add(tf.keras.Input(100, ))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='Adam',
metrics=['accuracy'])
mlflow.tensorflow.autolog()
history = model.fit(X_train, y_train, epochs=100)
mlflow.end_run()
Other info / logs
2022/06/07 10:47:22 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: 'PrefetchDataset' object has no attribute '_batch_size'
What component(s) does this bug affect?
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
What language(s) does this bug affect?
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created a year ago
- Comments:9
Great, thanks a lot!
@redfrexx @jim-brown Thank you for reporting this issue and providing a minimal, reproducible example. I’ve filed https://github.com/mlflow/mlflow/pull/6061 to fix the issue. Once it is merged, installing MLflow from
master
viapip install git+https://github.com/mlflow/mlflow@master
should resolve the issue. This fix will also be included in the next MLflow release.