Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] 1.18 stored model incompatible with mlflow 1.19

See original GitHub issue

Thank you for submitting an issue. Please refer to our issue policy for additional information about bug reports. For help with debugging your code, please refer to Stack Overflow.

Please fill in this bug report template to ensure a timely and thorough response.

Willingness to contribute

The MLflow Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the MLflow code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.
No. I cannot contribute a bug fix at this time.

System information

Have I written custom code (as opposed to using a stock example script provided in MLflow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Cluster: Driver: i3.xlarge, Workers: i3.xlarge, 8 workers, On-Demand and Spot, fall back to On-Demand, 8.3 (includes Apache Spark 3.1.1, Scala 2.12)
MLflow installed from (source or binary): PiPy
MLflow version (run mlflow --version): 1.19
Python version: 3.7.5
npm version, if running the dev UI:
Exact command to reproduce: model = mlflow.pytorch.load_model(model_uri=model_uri)

Describe the problem

A PyTorch model was stored by mlflow 1.18; using mlflow 1.19 to load the model returns error: TypeError: code() takes at most 15 arguments (16 given)

Code to reproduce issue

import mlflow 
model_uri=f"models:/path/Production"
model = mlflow.pytorch.load_model(model_uri)

Other info / logs

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-2671124357356893> in <module>
      4 model_uri=f"models:/language_detection/Production"
      5 location = model_reg[-1].source
----> 6 model = mlflow.pytorch.load_model(location)
      7 #model = mlflow.pytorch.load_model(model_uri=model_uri)

/databricks/python/lib/python3.7/site-packages/mlflow/pytorch/__init__.py in load_model(model_uri, **kwargs)
    676         )
    677     torch_model_artifacts_path = os.path.join(local_model_path, pytorch_conf["model_data"])
--> 678     return _load_model(path=torch_model_artifacts_path, **kwargs)
    679 
    680 

/databricks/python/lib/python3.7/site-packages/mlflow/pytorch/__init__.py in _load_model(path, **kwargs)
    588 
    589     if Version(torch.__version__) >= Version("1.5.0"):
--> 590         return torch.load(model_path, **kwargs)
    591     else:
    592         try:

/databricks/python/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    590                     opened_file.seek(orig_position)
    591                     return torch.jit.load(opened_file)
--> 592                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    593         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    594 

/databricks/python/lib/python3.7/site-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
    849     unpickler = pickle_module.Unpickler(data_file, **pickle_load_args)
    850     unpickler.persistent_load = persistent_load
--> 851     result = unpickler.load()
    852 
    853     torch._utils._validate_loaded_sparse_tensors()

TypeError: code() takes at most 15 arguments (16 given)

What component(s), interfaces, languages, and integrations does this bug affect?

Components

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/projects: MLproject format, project running backends
area/scoring: Local serving, model deployment tools, spark UDFs
area/server-infra: MLflow server, JavaScript dev server
area/tracking: Tracking Service, tracking client APIs, autologging

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

fabiorangelcommented, Jul 19, 2021

Hi @fabiorangel the error suggests that there is a mismatch in torch version or used pickling module. Can you share the model’s conda environment and how you loaded the model?

I am checking. But mlflow controls the serialization method. Problably calling torch serialization methods. I already checked the torch version and both clusters are using the same version. The python version is different though. The model was trained using a 3.8.8, and python 3.7.5 is used to load the model. Do you think python version could be the problem?

Yes . The error is due to the python version mismatch. I saved an example with 3.8.8 and while loading it with python 3.7.5, am seeing the same error.

Yeah. I tested here and that is the problem. However, the Minor should not affect compatibility. At this case, it is affecting the serialization. Anyway, I will close this issue. Thank you for the help.

0reactions

shrinath-sureshcommented, Jul 19, 2021

Hi @fabiorangel the error suggests that there is a mismatch in torch version or used pickling module. Can you share the model’s conda environment and how you loaded the model?

I am checking. But mlflow controls the serialization method. Problably calling torch serialization methods. I already checked the torch version and both clusters are using the same version. The python version is different though. The model was trained using a 3.8.8, and python 3.7.5 is used to load the model. Do you think python version could be the problem?

Yes . The error is due to the python version mismatch. I saved an example with 3.8.8 and while loading it with python 3.7.5, am seeing the same error.

Top Results From Across the Web

mlflow/CHANGELOG.md at master - GitHub

MLflow 1.25.1 is a patch release containing the following bug fixes: [Models] Fix a pyfunc artifact overwrite bug for when multiple artifacts are...

MLflow Models — MLflow 2.0.1 documentation

ID of the run that created the model, if the model was saved using MLflow Tracking. ... If the types cannot be made...

Python SDK release notes - Azure Machine Learning | Microsoft Learn

This change will break compatibility with models trained with SDK 1.37 or below due to newer Pandas interfaces being saved in the model....

MLflow/CHANGELOG and MLflow Releases (Page 4) | LibHunt

MLflow 1.23.1 is a patch release containing the following bug fixes: [Models] Fix a directory creation failure when loading PySpark ML models (#5299, ......

master PDF - Stable Baselines3 Documentation

By default, the replay buffer is not saved when calling model.save(), in order to save space on the disk (a re-.