question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using joblib to dump a scikit-learn model on x86 then read on z/OS passes in Decision Tree but fails on a GradientBoostingRegressor

See original GitHub issue

Description

Joblib loading works when serializing a Decision Tree classifier on x86 machine(MacOS) and loading it on z/OS however it fails when loading a GradientBoostingRegressor which has been serialized in the same way. I have made small changes into NumpyArrayWrapper to detect endianness and if loading on an incorrect endian system it would call array.byteswap(). However the object returned by the fit method in GBR causes a different path to be taken through the code and the creation of the array fails before we can even call byteswap().

Steps/Code to Reproduce

Example of GBR dumping.

`from sklearn.ensemble import GradientBoostingRegressor
from sklearn.pipeline import Pipeline
from sklearn import datasets
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)

gbr = GradientBoostingRegressor(max_depth=3)
model = Pipeline([('Gbr', gbr)])

model.fit(X_train, y_train)


from sklearn.externals import joblib
joblib.dump(model, 'GBTmodelx86.pkl')`

Example of DT model dumping

`from sklearn import datasets
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier

dataset = datasets.load_iris()

model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target)

from sklearn.externals import joblib
joblib.dump(model, 'DTmodelX86.pkl')`

Loading both models

`from sklearn.externals import joblib
model = joblib.load(<model.pkl>) `

Expected Results

No error in either.

Actual Results

numpy_pickle.py", line 118, in read_array array = pickle.load(unpickler.file_handle) File “sklearn/tree/_tree.pyx”, line 650, in sklearn.tree._tree.Tree._setstate (sklearn/tree/_tree.c:8403) ValueError: Did not recognise loaded array layout

Versions

z-OS-2.2-3906-64bit Python 3.6.1 (heads/v3.6.1-anaconda:aa4d638, May 18 2018, 12:10:40) [C] NumPy 1.12.1 SciPy 0.19.0 Scikit-Learn 0.18.1

Additional notes

I understand that different platforms are not supported, and I am not necessarily positive this is a scikit-learn issue, however pickle’s docs mention that it is platform independent and this issue seems to be fine if we don’t try to run the fit method on the gradient boost model.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
jnothmancommented, Apr 10, 2019

We don’t really support unpickling on different platforms, but if anyone diagnoses this further and finds a solution, we would consider a pull request to fix it.

If you are storing the model for prediction, I recommend ONNX

1reaction
NicolasHugcommented, Apr 11, 2019

Nothing to add to the issue unfortunately, but please be advised that importing from sklearn.externals isn’t recommended. We’ll stop vendoring joblib soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using joblib to dump a scikit-learn model on x86 then read on ...
Using joblib to dump a scikit-learn model on x86 then read on z/OS passes in Decision Tree but fails on a GradientBoostingRegressor.
Read more >
9. Model persistence — scikit-learn 1.2.0 documentation
After training a scikit-learn model, it is desirable to have a way to persist the model for future use without having to retrain....
Read more >
Extract Rules from Decision Tree in 3 Ways with Scikit-Learn ...
The rules extraction from the Decision Tree can help with better understanding how samples propagate through the tree during the prediction.
Read more >
Gradient Boosting Classifiers in Python with Scikit-Learn
For AdaBoost, many weak learners are created by initializing many decision tree algorithms that only have a single split, such as the "stump"...
Read more >
Machine Learning — How to Save and Load scikit-learn Models
In this post, we will explore how to persist in a model built using scikit-learn libraries in Python. Load the saved model for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found