question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KBinsDiscretizer: allow nans

See original GitHub issue

Missing values, represented as NaN, could be treated as a separate category in discretization. This seems much more sensible to me than imputing the missing data then discretizing.

In accordance with recent changes to other preprocessing, NaNs would simply be ignored in calculating fit statistics, and would be passed on to the encoder in transform. I can’t recall if we’re handling this sensibly in OneHotEncoder yet…

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:15 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
Framartincommented, Dec 29, 2018

Thanks a lot for pointing out #12045 (I didn’t noticed it). It will be useful to handle missing value when encode='ordinal'.

I will take a look at #11996 to see if I can take over the work on OneHotEncoder (to handle missing values when encode in ['onehot', 'onehot-dense']).

0reactions
PabloRMiracommented, May 10, 2020

Hi all,

I would like to help on this. I think, a good strategy would be to set the NaN-category to -1 in the ordinal encoding which then propagates naturally to the onehot-encoding.

What do you think about this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling nan values with KBinsDiscretizer - python
I'm trying to fit KBinsDiscretizer on it. I want to bin the non null values normally and mark null values as separate bin....
Read more >
sklearn.preprocessing.KBinsDiscretizer
Parameters: n_binsint or array-like of shape (n_features,), default=5. The number of bins to produce. Raises ValueError if n_bins < 2 .
Read more >
sklearn.preprocessing.KBinsDiscretizer.fit_transform
KBinsDiscretizer.fit_transform taken from open source projects. ... one where the nans are replaced by max(feature) + 1 # A split where nans go...
Read more >
How best to deal with missing event data - Cross Validated
I thought I might use KBinsDiscretizer to bin the data into one-hot ... My next idea is to replace NaNs with the max...
Read more >
Preprocessing with sklearn: a complete and ...
To give our code some meaning, we'll create a very small data set with ... axis: 0 for rows, 1 for columns; tresh:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found