Support drop option of OneHotEncoder
See original GitHub issueHi,
I tried to use a OneHotEncoder
with drop
option set to first
but when I do so, running the generated model with onnxruntime gives me the following error during prediction:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running OneHotEncoder node. Name:'OneHotEncoder7' Status Message: Unknown Category and zeros = 0.
See https://github.com/onnx/sklearn-onnx/issues/321#issuecomment-584052819 for an example of code used to generate and convert the model.
Issue Analytics
- State:
- Created 4 years ago
- Comments:9
Top Results From Across the Web
sklearn.preprocessing.OneHotEncoder
Changed in version 0.23: The option drop='if_binary' was added in 0.23. Changed in version 1.1: Support for dropping infrequent categories.
Read more >Python Scikit learn OneHotEncoder to encode select values only
ohe = OneHotEncoder(drop='first'). If you have 2 columns you want to encode and have specific values you want to not encode in each...
Read more >One-Hot Encoding in Scikit-Learn with OneHotEncoder - Datagy
In this tutorial, you'll learn how to use the OneHotEncoder class in Scikit-Learn to one hot encode your categorical data in sklearn.
Read more >OneHotEncoder — 1.0.2 - Feature-engine
The OneHotEncoder() replaces categorical variables by a set of binary variables, one per unique category. The encoder has the option to create k...
Read more >Dropping one of the columns when using one-hot encoding
you end up with correlated features, so you should drop one of them as a "reference" Dummy variables or indicator variables (these are...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I see what the problem is now. If we set drop=‘first’, sk2onnx removes the first category from each feature and hence when you do transform with that feature value, skl2onnx give the error, whereas scikit keeps that category value, and simply hides that category from the output. This needs to be fixed, thanks for reporting. Here is a simpler example which shows this:
If you print
model.categories_
, you still see all the categories in training data: [array([‘female’, ‘male’], dtype=‘<U6’), array([‘first’, ‘second’], dtype=‘<U6’)]@prabhat00155 tomorrow I will try to make a repro using one of the sklearn-provided datasets 😃