Add handle_missing and handle_unknown options to OrdinalEncoder
See original GitHub issuecategory_encoders.ordinal.OrdinalEncoder in scikit-learn-contrib/category_encoders has 2 really useful options:
handle_unknown
, options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which will impute the category -1.handle_missing
, options are ‘error’, ‘return_nan’, and ‘value, default to ‘value’, which treat nan as a category at fit time, or -2 at transform time if nan is not a category during fit.
These 2 options are really, really useful for handling real-world data
Describe the workflow you want to enable
- Handle new categories at predict time in OrdinalEncoder (OneHotEncoder already has this opion).
- Handle NaNs at fit and predict time in OrdinalEncoder
Describe your proposed solution
Port the logic for handle_unknown
and handle_missing
from category_encoders.ordinal.OrdinalEncoder
Describe alternatives you’ve considered, if relevant
Just using scikit-learn-contrib/category_encoders instead
Additional context
Every encoder in scikit-learn-contrib/category_encoders
has the option handle_unknown
and handle_missing
, giving users the flexibility to decide how to handle unknown or new values. This consistency in the API makes it really easy to switch between different encoders and try them out in your workflow.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:11
- Comments:19 (19 by maintainers)
Top Results From Across the Web
Ordinal — Category Encoders 2.5.1.post0 documentation
OrdinalEncoder (verbose=0, mapping=None, cols=None, drop_invariant=False, return_df=True, ... Returns the names of all transformed / added columns.
Read more >How to handle missing values (NaN) in categorical data when ...
One option here would be to use pandas get_dummies() function documented here. ... OneHotEncoder adds missing values as new column.
Read more >encoders — EvalML 0.62.0 documentation - Alteryx
handle_missing (string) – Options for how to handle missing (NaN) values ... In the event of a duplicate name, an integer will be...
Read more >Label Encoder: handle unknown value - Kaggle
Indeed, the scikit-learn's LabelEncoder does not have a handle unknown parameter, like the OneHotEncoder. But the OrdinalEncoder from the library ...
Read more >How to Handle Missing Values of Categorical Variables?
End Notes. Thanks for reading! This article introduces you to different ways to tackle the problem of having missing values for categorical ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
With 1.1, OrdinalEncoder now handles unknown values and missing values.
no need to IMO, we can just wait for #18968 to be solved for now.