Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] INVALID_PARAMETER_VALUE: Changing param values is not allowed.

See original GitHub issue

Issues Policy acknowledgement

I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

1.30.0

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WSL Ubuntu 20.04
Python version: 3.10.6
yarn version, if running the dev UI: N/A

Describe the problem

Autologging for TensorFlow (tf.keras) works when I run just python train.py but not when I run it from mlflow run on the MLproject (which uses the same train.py script).

It appears that the autologger logs the state of the model during creation and this prevents it from updating the log values after training.

Here’s the error I get:

2022/10/24 19:46:15 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='402712a4625a43bca38c0bce38fa4ed1'.

As you can see, the autolog apparently logged None for all of these values.

Again, the same script works well when I run it outside of MLproject.

Tracking information

No response

Code to reproduce issue

"""
TF/Keras Training script for MLFlow
"""

# mlflow run -e train_entry --env-manager=local --experiment-name=tony-reina-experiments .

from datetime import datetime
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

import click  # pip install click
import tensorflow as tf  # pip install tensorflow

# The following import and function call,
# are the only additions to code required
# to automatically log
# metrics and parameters to MLflow.
import mlflow  # pip install mlflow

EXPERIMENT_NAME = "tony-reina-experiments"


def load_data():
    """Load dataset and pre-process
    Fashion MNIST https://github.com/zalandoresearch/fashion-mnist
       28x28 grayscale images of clothes from 10 different categories
    """
    fashion_mnist = tf.keras.datasets.fashion_mnist

    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

    # Normalize the images from 0.0 to 1.0
    train_images = train_images / 255.0
    test_images = test_images / 255.0

    # Human-readable class names
    class_names = [
        "T-shirt/top",
        "Trouser",
        "Pullover",
        "Dress",
        "Coat",
        "Sandal",
        "Shirt",
        "Sneaker",
        "Bag",
        "Ankle boot",
    ]

    return train_images, train_labels, test_images, test_labels, class_names


def create_model(parameters):
    """Create a simple TensorFlow Keras model

    Args:
        parameters(dict): Number of units,
                          optimizer, and metrics for model

    """

    model = tf.keras.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(parameters["num_units"], activation="relu"),
            tf.keras.layers.Dense(10),
        ]
    )

    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=parameters["learning_rate"]),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[parameters["metrics"]],
    )

    return model


@click.command(help="The base training script for MLFlow.")
@click.option(
    "--num-units", default=128, type=int, help="Number of units in the dense layer"
)
@click.option("--epochs", default=3, type=int, help="Number of training epochs")
@click.option("--batch-size", default=32, type=int, help="Batch size")
@click.option("--learning-rate", default=1e-4, type=float, help="Learning rate")
@click.option("--metrics", default="accuracy", type=str, help="Model metric to track")
@click.option(
    "--training-data", default=".", type=str, help="Path to the training data"
)
@click.option("--testing-data", default=".", type=str, help="Path to the testing data")
def train(
    num_units,
    epochs,
    batch_size,
    learning_rate,
    metrics,
    training_data,
    testing_data,
):
    """Run training"""

    train_images, train_labels, test_images, test_labels, class_names = load_data()

    # Instead of passing lots of variables,
    # we'll just pass a dictionary
    parameters = {
        "num_units": num_units,
        "num_epochs": epochs,
        "batch_size": batch_size,
        "learning_rate": learning_rate,
        "metrics": metrics,
        "training_data": training_data,
        "testing_data": testing_data,
    }

    click.secho(parameters)

    click.secho("Setting up MLflow tracking uri...")
    mlflow.tracking.set_tracking_uri(os.environ.get("MLFLOW_TRACKING_URI"))
    mlflow.set_experiment(experiment_name=EXPERIMENT_NAME)

    mlflow.tensorflow.autolog(
        log_models=True,
        silent=False,
        registered_model_name="ye_olde_mnist_fashion",
    )

    current_time = datetime.now().strftime("%Y-%m-%d %H-%M-%S")
    click.secho("Starting the MLFlow Run...")

    model = create_model(parameters)

    with mlflow.start_run(
        #run_name=f"YeOldDemo-{current_time}",
        tags={"ImageTag": "local"},
        description="Ye Olde Model Xample",
    ):

        model.fit(
            train_images,
            train_labels,
            epochs=parameters["num_epochs"],
            batch_size=parameters["batch_size"],
        )

        click.secho("Finished training")

        test_loss, test_acc = model.evaluate(test_images, test_labels)

        mlflow.log_param(key="test_loss", value=test_loss)
        mlflow.log_param(key="test_acc", value=test_acc)

        mlflow.log_param(key="Class names", value=class_names)
        mlflow.log_param(key="TensorFlow version", value=tf.__version__)


if __name__ == "__main__":
    train()

Stack trace

2022/10/24 19:49:34 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='a6fc78cd0973486aa6b0ddb5f36581ae'.

Other info / logs

2022/10/24 19:49:34 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: INVALID_PARAMETER_VALUE: Changing param values is not allowed. Params were already logged='[{'key': 'validation_split', 'old_value': None, 'new_value': '0.0'}, {'key': 'shuffle', 'old_value': None, 'new_value': 'True'}, {'key': 'class_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'sample_weight', 'old_value': None, 'new_value': 'None'}, {'key': 'initial_epoch', 'old_value': None, 'new_value': '0'}, {'key': 'steps_per_epoch', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_steps', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_batch_size', 'old_value': None, 'new_value': 'None'}, {'key': 'validation_freq', 'old_value': None, 'new_value': '1'}, {'key': 'max_queue_size', 'old_value': None, 'new_value': '10'}, {'key': 'workers', 'old_value': None, 'new_value': '1'}, {'key': 'use_multiprocessing', 'old_value': None, 'new_value': 'False'}]' for run ID='a6fc78cd0973486aa6b0ddb5f36581ae'.

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

harupycommented, Oct 26, 2022

@tonyreina I was able to reproduce the issue with this command:

docker run --rm -w /workdir -v $(pwd):/workdir -e MLFLOW_TRACKING_URI=sqlite:///mlflow.db python:3.8 bash -c "pip install mlflow==1.29.0 tensorflow && mlflow run --env-manager=local -e train_entry --experiment-name=tony-reina-experiments . && rm mlflow.db"

I think you’re using mlflow 1.29.0. https://github.com/mlflow/mlflow/pull/7057 fixed the issue. mlflow 1.30.0 contains this patch.

0reactions

tonyreinacommented, Oct 27, 2022

Thanks. I finally figured it out. My MLFlow was 1.30.0 but my company server was using MLFlow 1.29.0. I’ve asked them to updrade.