Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score

See original GitHub issue

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

mlflow, version 1.27.0

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04.3 LTS
Python version: Python 3.9.7
yarn version, if running the dev UI: N/A

Describe the problem

Root cause

cannot pickle a model on the local side (unable to serialize a model on the local side)

Expected behavior

can pickle a model (can serialize a model) on the local side

Error logs

for classification

got via the python command

_pickle.PicklingError: Can’t pickle <function accuracy_score at 0x7efad52c2280>: it’s not the same object as sklearn.metrics._classification.accuracy_score

got via the ipykernel (Notebook kernel)

PicklingError: Can’t pickle <function accuracy_score at 0x7f17a5ca2a60>: it’s not the same object as sklearn.metrics._classification.accuracy_score

for regression:

got via the python command

_pickle.PicklingError: Can’t pickle <function r2_score at 0x7f162c7fd0d0>: it’s not the same object as sklearn.metrics._regression.r2_score

got via the ipykernel (Notebook kernel)

PicklingError: Can’t pickle <function r2_score at 0x7ff325880f70>: it’s not the same object as sklearn.metrics._regression.r2_score

How to reproduce?

for classification

import mlflow
mlflow.autolog()

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from autosklearn.classification import AutoSklearnClassifier

X, y = load_iris(return_X_y=True)
train_X, test_X, train_y, test_y = train_test_split(X, y)

# training
#model = RandomForestClassifier() # pass
model = AutoSklearnClassifier(    # failed
    memory_limit = 1024 * 12,     # (in MB) (default: 3072 MB)
    time_left_for_this_task = 60, # 60 sec.
    per_run_time_limit = 15)      # 15 sec.
model.fit(train_X, train_y)

# inference
print('score:', model.score(test_X, test_y))

# save the model to the local disk
import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)
    print('Model saved successfully')

for regression:

import mlflow
mlflow.autolog()

from sklearn.linear_model import LinearRegression
from autosklearn.regression import AutoSklearnRegressor

train_X = [[1], [3], [5], [7], [9]]
train_y = [2, 6, 10, 14, 18]

# training
# model = LinearRegression()      # pass
model = AutoSklearnRegressor(     # failed
    memory_limit = 1024 * 12,     # (in MB) (default: 3072 MB)
    time_left_for_this_task = 60, # 60 sec.
    per_run_time_limit = 15)      # 15 sec.
model.fit(train_X, train_y)

# inference
test_X = [[2], [4], [6], [8], [10]]
test_y = [4, 8, 12, 16, 20]
print('score:', model.score(test_X, test_y))

# save the model to the local disk
import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)
    print('Model saved successfully')

Tracking information

MLflow version: 1.27.0 Tracking URI: file:///workspace/mlruns Artifact URI: file:///workspace/mlruns/0/37326b5ea62046bcbd9b7eb8ab26cf23/artifacts

snapshot

Code to reproduce issue

import mlflow
import sklearn.metrics
import pickle

cannot_pickle = True

if cannot_pickle:
    # failed
    mlflow.sklearn.autolog()
    r2_public = sklearn.metrics.r2_score
    r2_private = sklearn.metrics._regression.r2_score
else:
    # pass
    r2_public = sklearn.metrics.r2_score
    r2_private = sklearn.metrics._regression.r2_score
    mlflow.sklearn.autolog()

print('sklearn.metrics.r2_score:')
print('__hash__:', r2_public.__hash__)
print('__code__:', r2_public.__code__)
print('-' * 60)
print('sklearn.metrics._regression.r2_score:')
print('__hash__:', r2_private.__hash__)
print('__code__:', r2_private.__code__)
print('-' * 60)

with open('r2.pkl', 'wb') as f:
    pickle.dump(r2_public, f)
    print('r2.pkl saved successfully')

Other info / logs

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
/tmp/ipykernel_836/453514921.py in <module>
     26 
     27 with open('r2.pkl', 'wb') as f:
---> 28     pickle.dump(r2_public, f)
     29     print('r2.pkl saved successfully')

PicklingError: Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:24 (13 by maintainers)

Top GitHub Comments

1reaction

harupycommented, Jul 26, 2022

@tsungjung411 Feel free to reopen the issue if you need further help.

1reaction

harupycommented, Jul 26, 2022

cloudpickle worked:

import mlflow

mlflow.autolog()  # failed

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from autosklearn.classification import AutoSklearnClassifier

# mlflow.autolog()  # pass

X, y = load_iris(return_X_y=True)
train_X, test_X, train_y, test_y = train_test_split(X, y)

# training
# model = RandomForestClassifier() # pass
model = AutoSklearnClassifier(  # failed
    memory_limit=1024 * 12,  # (in MB) (default: 3072 MB)
    time_left_for_this_task=60,  # 60 sec.
    per_run_time_limit=15,  # 15 sec.
)  # 15 sec.
model.fit(train_X, train_y)

# inference
print("score:", model.score(test_X, test_y))

# save the model to the local disk
import cloudpickle

with open("model.pkl", "wb") as f:
    cloudpickle.dump(model, f)
print("Model saved successfully")

Output:

2022/07/26 14:23:20 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.
2022/07/26 14:23:20 INFO mlflow.pyspark.ml: No SparkSession detected. Autologging will log pyspark.ml models contained in the default allowlist. To specify a custom allowlist, initialize a SparkSession prior to calling mlflow.pyspark.ml.autolog() and specify the path to your allowlist file via the spark.mlflow.pysparkml.autolog.logModelAllowlistFile conf.
2022/07/26 14:23:20 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.ml.
2022/07/26 14:23:20 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2022/07/26 14:23:24 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'ad52984e82214f91a7fd1847c7592311', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2022/07/26 14:23:24 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2022/07/26 14:23:24 WARNING mlflow.sklearn: Failed to infer model signature: 'StandardScalerComponent' object has no attribute 'predict'
2022/07/26 14:23:25 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '24d241776eb144e5b043eaaf532201b1', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2022/07/26 14:23:25 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2022/07/26 14:23:25 WARNING mlflow.sklearn: Failed to infer model signature: the trained model does not specify a `predict` function, which is required in order to infer the signature
score: 0.9736842105263158
Model saved successfully