question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support drop option of OneHotEncoder

See original GitHub issue

Hi,

I tried to use a OneHotEncoder with drop option set to first but when I do so, running the generated model with onnxruntime gives me the following error during prediction:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running OneHotEncoder node. Name:'OneHotEncoder7' Status Message: Unknown Category and zeros = 0.

See https://github.com/onnx/sklearn-onnx/issues/321#issuecomment-584052819 for an example of code used to generate and convert the model.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
prabhat00155commented, Feb 17, 2020

I see what the problem is now. If we set drop=‘first’, sk2onnx removes the first category from each feature and hence when you do transform with that feature value, skl2onnx give the error, whereas scikit keeps that category value, and simply hides that category from the output. This needs to be fixed, thanks for reporting. Here is a simpler example which shows this:

X = np.array([['male', 'first'], ['female', 'first'], ['male', 'second']])
model = OneHotEncoder(drop='first').fit(X)
print(model.transform(X).toarray())
onnx_model = convert_sklearn(model, 'ohe', [('input', StringTensorType([None, 2]))])
save_model(onnx_model, 'ohe.onnx')
sess = InferenceSession('ohe.onnx')
res = sess.run(None, input_feed={'input': X})

[[1. 0.]
 [0. 0.]
 [1. 1.]]
---------------------------------------------------------------------------
Fail                                      Traceback (most recent call last)
<ipython-input-57-66d5e90d517b> in <module>
      5 save_model(onnx_model, 'ohe.onnx')
      6 sess = InferenceSession('ohe.onnx')
----> 7 res = sess.run(None, input_feed={'input': X})
      8 res

~/Documents/MachineLearning/onnx_projects/skl_env/lib/python3.6/site-packages/onnxruntime/capi/session.py in run(self, output_names, input_feed, run_options)
    140             output_names = [output.name for output in self._outputs_meta]
    141         try:
--> 142             return self._sess.run(output_names, input_feed, run_options)
    143         except C.EPFail as err:
    144             if self._enable_fallback:

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running OneHotEncoder node. Name:'OneHotEncoder1' Status Message: Unknown Category and zeros = 0.

If you print model.categories_, you still see all the categories in training data: [array([‘female’, ‘male’], dtype=‘<U6’), array([‘first’, ‘second’], dtype=‘<U6’)]

1reaction
victornoelcommented, Feb 13, 2020

@prabhat00155 tomorrow I will try to make a repro using one of the sklearn-provided datasets 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.preprocessing.OneHotEncoder
Changed in version 0.23: The option drop='if_binary' was added in 0.23. Changed in version 1.1: Support for dropping infrequent categories.
Read more >
Python Scikit learn OneHotEncoder to encode select values only
ohe = OneHotEncoder(drop='first'). If you have 2 columns you want to encode and have specific values you want to not encode in each...
Read more >
One-Hot Encoding in Scikit-Learn with OneHotEncoder - Datagy
In this tutorial, you'll learn how to use the OneHotEncoder class in Scikit-Learn to one hot encode your categorical data in sklearn.
Read more >
OneHotEncoder — 1.0.2 - Feature-engine
The OneHotEncoder() replaces categorical variables by a set of binary variables, one per unique category. The encoder has the option to create k...
Read more >
Dropping one of the columns when using one-hot encoding
you end up with correlated features, so you should drop one of them as a "reference" Dummy variables or indicator variables (these are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found