Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KBinsDiscretizer: allow nans

See original GitHub issue

Missing values, represented as NaN, could be treated as a separate category in discretization. This seems much more sensible to me than imputing the missing data then discretizing.

In accordance with recent changes to other preprocessing, NaNs would simply be ignored in calculating fit statistics, and would be passed on to the encoder in transform. I can’t recall if we’re handling this sensibly in OneHotEncoder yet…

Issue Analytics

State:
Created 6 years ago
Reactions:4
Comments:15 (10 by maintainers)

Top GitHub Comments

1reaction

Framartincommented, Dec 29, 2018

Thanks a lot for pointing out #12045 (I didn’t noticed it). It will be useful to handle missing value when encode='ordinal'.

I will take a look at #11996 to see if I can take over the work on OneHotEncoder (to handle missing values when encode in ['onehot', 'onehot-dense']).

0reactions

PabloRMiracommented, May 10, 2020

Hi all,

I would like to help on this. I think, a good strategy would be to set the NaN-category to -1 in the ordinal encoding which then propagates naturally to the onehot-encoding.

What do you think about this?