Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support drop option of OneHotEncoder

See original GitHub issue

Hi,

I tried to use a OneHotEncoder with drop option set to first but when I do so, running the generated model with onnxruntime gives me the following error during prediction:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running OneHotEncoder node. Name:'OneHotEncoder7' Status Message: Unknown Category and zeros = 0.

See https://github.com/onnx/sklearn-onnx/issues/321#issuecomment-584052819 for an example of code used to generate and convert the model.

Issue Analytics

State:
Created 4 years ago
Comments:9

Top GitHub Comments

1reaction

prabhat00155commented, Feb 17, 2020

I see what the problem is now. If we set drop=‘first’, sk2onnx removes the first category from each feature and hence when you do transform with that feature value, skl2onnx give the error, whereas scikit keeps that category value, and simply hides that category from the output. This needs to be fixed, thanks for reporting. Here is a simpler example which shows this:

X = np.array([['male', 'first'], ['female', 'first'], ['male', 'second']])
model = OneHotEncoder(drop='first').fit(X)
print(model.transform(X).toarray())
onnx_model = convert_sklearn(model, 'ohe', [('input', StringTensorType([None, 2]))])
save_model(onnx_model, 'ohe.onnx')
sess = InferenceSession('ohe.onnx')
res = sess.run(None, input_feed={'input': X})

[[1. 0.]
 [0. 0.]
 [1. 1.]]
---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
<ipython-input-57-66d5e90d517b> in <module>
      5 save_model(onnx_model, 'ohe.onnx')
      6 sess = InferenceSession('ohe.onnx')
----> 7 res = sess.run(None, input_feed={'input': X})
      8 res

~/Documents/MachineLearning/onnx_projects/skl_env/lib/python3.6/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
    140             output_names = [output.name for output in self._outputs_meta]
    141         try:
--> 142             return self._sess.run(output_names, input_feed, run_options)
    143         except C.EPFail as err:
    144             if self._enable_fallback:

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running OneHotEncoder node. Name:'OneHotEncoder1' Status Message: Unknown Category and zeros = 0.

If you print model.categories_, you still see all the categories in training data: [array([‘female’, ‘male’], dtype=‘<U6’), array([‘first’, ‘second’], dtype=‘<U6’)]

1reaction

victornoelcommented, Feb 13, 2020

@prabhat00155 tomorrow I will try to make a repro using one of the sklearn-provided datasets 😃

Top Results From Across the Web

sklearn.preprocessing.OneHotEncoder

Changed in version 0.23: The option drop='if_binary' was added in 0.23. Changed in version 1.1: Support for dropping infrequent categories.

Python Scikit learn OneHotEncoder to encode select values only

ohe = OneHotEncoder(drop='first'). If you have 2 columns you want to encode and have specific values you want to not encode in each...

One-Hot Encoding in Scikit-Learn with OneHotEncoder - Datagy

In this tutorial, you'll learn how to use the OneHotEncoder class in Scikit-Learn to one hot encode your categorical data in sklearn.

OneHotEncoder — 1.0.2 - Feature-engine

The OneHotEncoder() replaces categorical variables by a set of binary variables, one per unique category. The encoder has the option to create k...

Dropping one of the columns when using one-hot encoding

you end up with correlated features, so you should drop one of them as a "reference" Dummy variables or indicator variables (these are...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Support drop option of OneHotEncoder

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Difference in prediction output of a scikit-learn model and ONNX model for the same data.

BooleanTensorType not supported?