question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score

See original GitHub issue

Willingness to contribute

No. I cannot contribute a bug fix at this time.

MLflow version

mlflow, version 1.27.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04.3 LTS
  • Python version: Python 3.9.7
  • yarn version, if running the dev UI: N/A

Describe the problem

Root cause

cannot pickle a model on the local side (unable to serialize a model on the local side)


Expected behavior

can pickle a model (can serialize a model) on the local side


Error logs

for classification

got via the python command

_pickle.PicklingError: Can’t pickle <function accuracy_score at 0x7efad52c2280>: it’s not the same object as sklearn.metrics._classification.accuracy_score

got via the ipykernel (Notebook kernel)

PicklingError: Can’t pickle <function accuracy_score at 0x7f17a5ca2a60>: it’s not the same object as sklearn.metrics._classification.accuracy_score

for regression:

got via the python command

_pickle.PicklingError: Can’t pickle <function r2_score at 0x7f162c7fd0d0>: it’s not the same object as sklearn.metrics._regression.r2_score

got via the ipykernel (Notebook kernel)

PicklingError: Can’t pickle <function r2_score at 0x7ff325880f70>: it’s not the same object as sklearn.metrics._regression.r2_score


How to reproduce?

for classification

import mlflow
mlflow.autolog()

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from autosklearn.classification import AutoSklearnClassifier

X, y = load_iris(return_X_y=True)
train_X, test_X, train_y, test_y = train_test_split(X, y)

# training
#model = RandomForestClassifier() # pass
model = AutoSklearnClassifier(    # failed
    memory_limit = 1024 * 12,     # (in MB) (default: 3072 MB)
    time_left_for_this_task = 60, # 60 sec.
    per_run_time_limit = 15)      # 15 sec.
model.fit(train_X, train_y)

# inference
print('score:', model.score(test_X, test_y))

# save the model to the local disk
import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)
    print('Model saved successfully')

image

for regression:

import mlflow
mlflow.autolog()

from sklearn.linear_model import LinearRegression
from autosklearn.regression import AutoSklearnRegressor

train_X = [[1], [3], [5], [7], [9]]
train_y = [2, 6, 10, 14, 18]

# training
# model = LinearRegression()      # pass
model = AutoSklearnRegressor(     # failed
    memory_limit = 1024 * 12,     # (in MB) (default: 3072 MB)
    time_left_for_this_task = 60, # 60 sec.
    per_run_time_limit = 15)      # 15 sec.
model.fit(train_X, train_y)

# inference
test_X = [[2], [4], [6], [8], [10]]
test_y = [4, 8, 12, 16, 20]
print('score:', model.score(test_X, test_y))

# save the model to the local disk
import pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)
    print('Model saved successfully')

image

Tracking information

MLflow version: 1.27.0 Tracking URI: file:///workspace/mlruns Artifact URI: file:///workspace/mlruns/0/37326b5ea62046bcbd9b7eb8ab26cf23/artifacts

snapshot

image

Code to reproduce issue

import mlflow
import sklearn.metrics
import pickle

cannot_pickle = True

if cannot_pickle:
    # failed
    mlflow.sklearn.autolog()
    r2_public = sklearn.metrics.r2_score
    r2_private = sklearn.metrics._regression.r2_score
else:
    # pass
    r2_public = sklearn.metrics.r2_score
    r2_private = sklearn.metrics._regression.r2_score
    mlflow.sklearn.autolog()

print('sklearn.metrics.r2_score:')
print('__hash__:', r2_public.__hash__)
print('__code__:', r2_public.__code__)
print('-' * 60)
print('sklearn.metrics._regression.r2_score:')
print('__hash__:', r2_private.__hash__)
print('__code__:', r2_private.__code__)
print('-' * 60)

with open('r2.pkl', 'wb') as f:
    pickle.dump(r2_public, f)
    print('r2.pkl saved successfully')

Other info / logs

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
/tmp/ipykernel_836/453514921.py in <module>
     26 
     27 with open('r2.pkl', 'wb') as f:
---> 28     pickle.dump(r2_public, f)
     29     print('r2.pkl saved successfully')

PicklingError: Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:24 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
harupycommented, Jul 26, 2022

@tsungjung411 Feel free to reopen the issue if you need further help.

1reaction
harupycommented, Jul 26, 2022

cloudpickle worked:

import mlflow

mlflow.autolog()  # failed

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from autosklearn.classification import AutoSklearnClassifier

# mlflow.autolog()  # pass

X, y = load_iris(return_X_y=True)
train_X, test_X, train_y, test_y = train_test_split(X, y)

# training
# model = RandomForestClassifier() # pass
model = AutoSklearnClassifier(  # failed
    memory_limit=1024 * 12,  # (in MB) (default: 3072 MB)
    time_left_for_this_task=60,  # 60 sec.
    per_run_time_limit=15,  # 15 sec.
)  # 15 sec.
model.fit(train_X, train_y)

# inference
print("score:", model.score(test_X, test_y))

# save the model to the local disk
import cloudpickle

with open("model.pkl", "wb") as f:
    cloudpickle.dump(model, f)
print("Model saved successfully")

Output:

2022/07/26 14:23:20 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.
2022/07/26 14:23:20 INFO mlflow.pyspark.ml: No SparkSession detected. Autologging will log pyspark.ml models contained in the default allowlist. To specify a custom allowlist, initialize a SparkSession prior to calling mlflow.pyspark.ml.autolog() and specify the path to your allowlist file via the spark.mlflow.pysparkml.autolog.logModelAllowlistFile conf.
2022/07/26 14:23:20 INFO mlflow.tracking.fluent: Autologging successfully enabled for pyspark.ml.
2022/07/26 14:23:20 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2022/07/26 14:23:24 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'ad52984e82214f91a7fd1847c7592311', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2022/07/26 14:23:24 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2022/07/26 14:23:24 WARNING mlflow.sklearn: Failed to infer model signature: 'StandardScalerComponent' object has no attribute 'predict'
2022/07/26 14:23:25 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '24d241776eb144e5b043eaaf532201b1', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2022/07/26 14:23:25 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2022/07/26 14:23:25 WARNING mlflow.sklearn: Failed to infer model signature: the trained model does not specify a `predict` function, which is required in order to infer the signature
score: 0.9736842105263158
Model saved successfully
Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.metrics.r2_score — scikit-learn 1.2.0 documentation
R 2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be...
Read more >
Significant mismatch between `r2_score` of `scikit-learn` and ...
I'm providing my code here as reference, which computes the example in the Wikipedia page linked above. from sklearn.metrics import r2_score ...
Read more >
R2 Score — PyTorch-Metrics 0.11.0 documentation
Defines aggregation in the case of multiple output scores. Can be one of the following strings: 'raw_values' returns full set of scores.
Read more >
cannot pickle 'onnxruntime.capi.onnxruntime_pybind11_state ...
mlflow/mlflow[BUG] Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score#6268.
Read more >
Save and Load Machine Learning Models in Python with scikit ...
Tutorial Overview. This tutorial is divided into 3 parts, they are: Save Your Model with pickle; Save Your Model with joblib; Tips for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found