[BUG] Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score
See original GitHub issueWillingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
mlflow, version 1.27.0
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04.3 LTS
- Python version: Python 3.9.7
- yarn version, if running the dev UI: N/A
Describe the problem
Root cause
cannot pickle a model on the local side (unable to serialize a model on the local side)
Expected behavior
can pickle a model (can serialize a model) on the local side
Error logs
for classification
got via the python
command
_pickle.PicklingError: Can’t pickle <function accuracy_score at 0x7efad52c2280>: it’s not the same object as sklearn.metrics._classification.accuracy_score
got via the ipykernel (Notebook kernel)
PicklingError: Can’t pickle <function accuracy_score at 0x7f17a5ca2a60>: it’s not the same object as sklearn.metrics._classification.accuracy_score
for regression:
got via the python
command
_pickle.PicklingError: Can’t pickle <function r2_score at 0x7f162c7fd0d0>: it’s not the same object as sklearn.metrics._regression.r2_score
got via the ipykernel (Notebook kernel)
PicklingError: Can’t pickle <function r2_score at 0x7ff325880f70>: it’s not the same object as sklearn.metrics._regression.r2_score
How to reproduce?
for classification
import mlflow
mlflow.autolog()
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from autosklearn.classification import AutoSklearnClassifier
X, y = load_iris(return_X_y=True)
train_X, test_X, train_y, test_y = train_test_split(X, y)
# training
#model = RandomForestClassifier() # pass
model = AutoSklearnClassifier( # failed
memory_limit = 1024 * 12, # (in MB) (default: 3072 MB)
time_left_for_this_task = 60, # 60 sec.
per_run_time_limit = 15) # 15 sec.
model.fit(train_X, train_y)
# inference
print('score:', model.score(test_X, test_y))
# save the model to the local disk
import pickle
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
print('Model saved successfully')
for regression:
import mlflow
mlflow.autolog()
from sklearn.linear_model import LinearRegression
from autosklearn.regression import AutoSklearnRegressor
train_X = [[1], [3], [5], [7], [9]]
train_y = [2, 6, 10, 14, 18]
# training
# model = LinearRegression() # pass
model = AutoSklearnRegressor( # failed
memory_limit = 1024 * 12, # (in MB) (default: 3072 MB)
time_left_for_this_task = 60, # 60 sec.
per_run_time_limit = 15) # 15 sec.
model.fit(train_X, train_y)
# inference
test_X = [[2], [4], [6], [8], [10]]
test_y = [4, 8, 12, 16, 20]
print('score:', model.score(test_X, test_y))
# save the model to the local disk
import pickle
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
print('Model saved successfully')
Tracking information
MLflow version: 1.27.0 Tracking URI: file:///workspace/mlruns Artifact URI: file:///workspace/mlruns/0/37326b5ea62046bcbd9b7eb8ab26cf23/artifacts
snapshot
Code to reproduce issue
import mlflow
import sklearn.metrics
import pickle
cannot_pickle = True
if cannot_pickle:
# failed
mlflow.sklearn.autolog()
r2_public = sklearn.metrics.r2_score
r2_private = sklearn.metrics._regression.r2_score
else:
# pass
r2_public = sklearn.metrics.r2_score
r2_private = sklearn.metrics._regression.r2_score
mlflow.sklearn.autolog()
print('sklearn.metrics.r2_score:')
print('__hash__:', r2_public.__hash__)
print('__code__:', r2_public.__code__)
print('-' * 60)
print('sklearn.metrics._regression.r2_score:')
print('__hash__:', r2_private.__hash__)
print('__code__:', r2_private.__code__)
print('-' * 60)
with open('r2.pkl', 'wb') as f:
pickle.dump(r2_public, f)
print('r2.pkl saved successfully')
Other info / logs
---------------------------------------------------------------------------
PicklingError Traceback (most recent call last)
/tmp/ipykernel_836/453514921.py in <module>
26
27 with open('r2.pkl', 'wb') as f:
---> 28 pickle.dump(r2_public, f)
29 print('r2.pkl saved successfully')
PicklingError: Can't pickle <function r2_score at 0x7f1b594cf310>: it's not the same object as sklearn.metrics._regression.r2_score
What component(s) does this bug affect?
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
What language(s) does this bug affect?
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created a year ago
- Comments:24 (13 by maintainers)
@tsungjung411 Feel free to reopen the issue if you need further help.
cloudpickle
worked:Output: