[BUG] INVALID_PARAMETER_VALUE: Changing param values is not allowed.
See original GitHub issueIssues Policy acknowledgement
- I have read and agree to submit bug reports in accordance with the issues policy
Willingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
1.30.0
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WSL Ubuntu 20.04
- Python version: 3.10.6
- yarn version, if running the dev UI: N/A
Describe the problem
Autologging for TensorFlow (tf.keras) works when I run just python train.py
but not when I run it from mlflow run
on the MLproject (which uses the same train.py
script).
It appears that the autologger logs the state of the model during creation and this prevents it from updating the log values after training.
Here’s the error I get:
2022/10/24 19:46:15 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='402712a4625a43bca38c0bce38fa4ed1'.
As you can see, the autolog apparently logged None
for all of these values.
Again, the same script works well when I run it outside of MLproject.
Tracking information
No response
Code to reproduce issue
"""
TF/Keras Training script for MLFlow
"""
# mlflow run -e train_entry --env-manager=local --experiment-name=tony-reina-experiments .
from datetime import datetime
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import click # pip install click
import tensorflow as tf # pip install tensorflow
# The following import and function call,
# are the only additions to code required
# to automatically log
# metrics and parameters to MLflow.
import mlflow # pip install mlflow
EXPERIMENT_NAME = "tony-reina-experiments"
def load_data():
"""Load dataset and pre-process
Fashion MNIST https://github.com/zalandoresearch/fashion-mnist
28x28 grayscale images of clothes from 10 different categories
"""
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Normalize the images from 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0
# Human-readable class names
class_names = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
return train_images, train_labels, test_images, test_labels, class_names
def create_model(parameters):
"""Create a simple TensorFlow Keras model
Args:
parameters(dict): Number of units,
optimizer, and metrics for model
"""
model = tf.keras.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(parameters["num_units"], activation="relu"),
tf.keras.layers.Dense(10),
]
)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=parameters["learning_rate"]),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[parameters["metrics"]],
)
return model
@click.command(help="The base training script for MLFlow.")
@click.option(
"--num-units", default=128, type=int, help="Number of units in the dense layer"
)
@click.option("--epochs", default=3, type=int, help="Number of training epochs")
@click.option("--batch-size", default=32, type=int, help="Batch size")
@click.option("--learning-rate", default=1e-4, type=float, help="Learning rate")
@click.option("--metrics", default="accuracy", type=str, help="Model metric to track")
@click.option(
"--training-data", default=".", type=str, help="Path to the training data"
)
@click.option("--testing-data", default=".", type=str, help="Path to the testing data")
def train(
num_units,
epochs,
batch_size,
learning_rate,
metrics,
training_data,
testing_data,
):
"""Run training"""
train_images, train_labels, test_images, test_labels, class_names = load_data()
# Instead of passing lots of variables,
# we'll just pass a dictionary
parameters = {
"num_units": num_units,
"num_epochs": epochs,
"batch_size": batch_size,
"learning_rate": learning_rate,
"metrics": metrics,
"training_data": training_data,
"testing_data": testing_data,
}
click.secho(parameters)
click.secho("Setting up MLflow tracking uri...")
mlflow.tracking.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
mlflow.set_experiment(experiment_name=EXPERIMENT_NAME)
mlflow.tensorflow.autolog(
log_models=True,
silent=False,
registered_model_name="ye_olde_mnist_fashion",
)
current_time = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
click.secho("Starting the MLFlow Run...")
model = create_model(parameters)
with mlflow.start_run(
#run_name=f"YeOldDemo-{current_time}",
tags={"ImageTag": "local"},
description="Ye Olde Model Xample",
):
model.fit(
train_images,
train_labels,
epochs=parameters["num_epochs"],
batch_size=parameters["batch_size"],
)
click.secho("Finished training")
test_loss, test_acc = model.evaluate(test_images, test_labels)
mlflow.log_param(key="test_loss", value=test_loss)
mlflow.log_param(key="test_acc", value=test_acc)
mlflow.log_param(key="Class names", value=class_names)
mlflow.log_param(key="TensorFlow version", value=tf.__version__)
if __name__ == "__main__":
train()
Stack trace
2022/10/24 19:49:34 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='a6fc78cd0973486aa6b0ddb5f36581ae'.
Other info / logs
2022/10/24 19:49:34 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='a6fc78cd0973486aa6b0ddb5f36581ae'.
What component(s) does this bug affect?
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
What language(s) does this bug affect?
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
@tonyreina I was able to reproduce the issue with this command:
I think you’re using mlflow 1.29.0. https://github.com/mlflow/mlflow/pull/7057 fixed the issue. mlflow 1.30.0 contains this patch.
Thanks. I finally figured it out. My MLFlow was 1.30.0 but my company server was using MLFlow 1.29.0. I’ve asked them to updrade.