question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add handle_missing and handle_unknown options to OrdinalEncoder

See original GitHub issue

category_encoders.ordinal.OrdinalEncoder in scikit-learn-contrib/category_encoders has 2 really useful options:

  1. handle_unknown, options are ‘error’, ‘return_nan’ and ‘value’, defaults to ‘value’, which will impute the category -1.
  2. handle_missing, options are ‘error’, ‘return_nan’, and ‘value, default to ‘value’, which treat nan as a category at fit time, or -2 at transform time if nan is not a category during fit.

These 2 options are really, really useful for handling real-world data

Describe the workflow you want to enable

  1. Handle new categories at predict time in OrdinalEncoder (OneHotEncoder already has this opion).
  2. Handle NaNs at fit and predict time in OrdinalEncoder

Describe your proposed solution

Port the logic for handle_unknown and handle_missing from category_encoders.ordinal.OrdinalEncoder

Describe alternatives you’ve considered, if relevant

Just using scikit-learn-contrib/category_encoders instead

Additional context

Every encoder in scikit-learn-contrib/category_encoders has the option handle_unknown and handle_missing, giving users the flexibility to decide how to handle unknown or new values. This consistency in the API makes it really easy to switch between different encoders and try them out in your workflow.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:11
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
thomasjpfancommented, Jul 28, 2022

With 1.1, OrdinalEncoder now handles unknown values and missing values.

1reaction
NicolasHugcommented, Dec 7, 2020

no need to IMO, we can just wait for #18968 to be solved for now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ordinal — Category Encoders 2.5.1.post0 documentation
OrdinalEncoder (verbose=0, mapping=None, cols=None, drop_invariant=False, return_df=True, ... Returns the names of all transformed / added columns.
Read more >
How to handle missing values (NaN) in categorical data when ...
One option here would be to use pandas get_dummies() function documented here. ... OneHotEncoder adds missing values as new column.
Read more >
encoders — EvalML 0.62.0 documentation - Alteryx
handle_missing (string) – Options for how to handle missing (NaN) values ... In the event of a duplicate name, an integer will be...
Read more >
Label Encoder: handle unknown value - Kaggle
Indeed, the scikit-learn's LabelEncoder does not have a handle unknown parameter, like the OneHotEncoder. But the OrdinalEncoder from the library ...
Read more >
How to Handle Missing Values of Categorical Variables?
End Notes. Thanks for reading! This article introduces you to different ways to tackle the problem of having missing values for categorical ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found