question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot pickle predict_proba function with 1.0 release

See original GitHub issue

Describe the bug

When trying to pickle a scikit learn predict_proba function, I now see the error in the latest release:

_pickle.PicklingError: Can’t pickle <function BaseSVC.predict_proba at 0x000001F3460AAEE8>: it’s not the same object as sklearn.svm._base.BaseSVC.predict_proba

This is probably due to this PR: https://github.com/scikit-learn/scikit-learn/pull/19948

specifically, I believe this is because we return a lambda now here, which can no longer be pickled:

https://github.com/scikit-learn/scikit-learn/blob/642127806a830346886a0337fcbefedc871159c0/sklearn/utils/metaestimators.py#L162

This can be easily fixed by turning it into a function in the file.

I suppose there is a philosophical question of whether we should be able to pickle functions at all. I think we should. But it’s probably not as important as pickling models. In any case this should be a simple and easy fix.

Steps/Code to Reproduce

from joblib import dump, load
from sklearn import svm
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

def create_scikit_cancer_data():
    breast_cancer_data = load_breast_cancer()
    classes = breast_cancer_data.target_names.tolist()
    x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)
    feature_names = breast_cancer_data.feature_names
    classes = breast_cancer_data.target_names.tolist()
    return x_train, x_test, y_train, y_test, feature_names, classes

def create_sklearn_svm_classifier(X, y, probability=True):
    clf = svm.SVC(gamma=0.001, C=100., probability=probability, random_state=777)
    model = clf.fit(X, y)
    return model

x_train, x_test, y_train, _, feature_names, target_names = create_scikit_cancer_data()
model = create_sklearn_svm_classifier(x_train, y_train)
with open('pickle_model_function', 'wb') as stream:
    dump(model.predict_proba, stream)

Expected Results

We should be able to pickle the function

Actual Results

>>> from joblib import dump, load
>>> from sklearn import svm
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.model_selection import train_test_split
>>>
>>> def create_scikit_cancer_data():
...     breast_cancer_data = load_breast_cancer()
...     classes = breast_cancer_data.target_names.tolist()
...     x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)
...     feature_names = breast_cancer_data.feature_names
...     classes = breast_cancer_data.target_names.tolist()
...     return x_train, x_test, y_train, y_test, feature_names, classes
...
>>> def create_sklearn_svm_classifier(X, y, probability=True):
...     clf = svm.SVC(gamma=0.001, C=100., probability=probability, random_state=777)
...     model = clf.fit(X, y)
...     return model
...
>>> x_train, x_test, y_train, _, feature_names, target_names = create_scikit_cancer_data()
>>> model = create_sklearn_svm_classifier(x_train, y_train)
>>> with open('pickle_model_function', 'wb') as stream:
...     dump(model.predict_proba, stream)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\ilmat\AppData\Local\Continuum\Miniconda3\envs\test\lib\site-packages\joblib\numpy_pickle.py", line 482, in dump
    NumpyPickler(filename, protocol=protocol).dump(value)
  File "C:\Users\ilmat\AppData\Local\Continuum\Miniconda3\envs\test\lib\pickle.py", line 437, in dump
    self.save(obj)
  File "C:\Users\ilmat\AppData\Local\Continuum\Miniconda3\envs\test\lib\site-packages\joblib\numpy_pickle.py", line 282, in save
    return Pickler.save(self, obj)
  File "C:\Users\ilmat\AppData\Local\Continuum\Miniconda3\envs\test\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Users\ilmat\AppData\Local\Continuum\Miniconda3\envs\test\lib\pickle.py", line 965, in save_global
    (obj, module_name, name))
_pickle.PicklingError: Can't pickle <function BaseSVC.predict_proba at 0x000002324C572CA8>: it's not the same object as sklearn.svm._base.BaseSVC.predict_proba

Versions

this only happens with latest 1.0 release and it broke our tests/builds, I’m trying to work around it by pickling the model instead here: https://github.com/interpretml/interpret-community/pull/455

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, Oct 23, 2021

Right but I though lambdas should be serialized OK by joblib with cloudpickle?

@rth joblib (actually loky) only uses cloudpickle for transient object communication between Python processes for Parallel calls, not for joblib dump and joblib load (e.g. longer term storage). I don’t think it’s a good idea to use cloudpickle for disk based serialization.

1reaction
thomasjpfancommented, Oct 19, 2021

If the fix is easy to maintain, then I think it is okay to support serialized methods.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cannot pickle predict_proba function with 1.0 release #21344
Describe the bug When trying to pickle a scikit learn predict_proba function, I now see the error in the latest release: _pickle.PicklingError: Can't...
Read more >
Why does predict_proba function print the probabilities in ...
The class labels are predicted using predict() function, while the predicted probabilities are printed using predict_proba() function.
Read more >
Save and Load Machine Learning Models in Python with scikit ...
I have trained a model using liblinearutils. The model could not be saved using pickle as it gives error that ctype module with...
Read more >
cannot pickle 'onnxruntime.capi.onnxruntime_pybind11_state ...
When trying to pickle a scikit learn predict_proba function, I now see the error in the latest release: pickle.PicklingError: Can't pickle : it's...
Read more >
Release history — scikit-learn 0.19.2 documentation
predict_proba function used to return a 3d array ( n_samples , n_classes , n_outputs ). In the case where different target columns had...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found