KBinsDiscretizer: allow nans
See original GitHub issueMissing values, represented as NaN, could be treated as a separate category in discretization. This seems much more sensible to me than imputing the missing data then discretizing.
In accordance with recent changes to other preprocessing, NaNs would simply be ignored in calculating fit
statistics, and would be passed on to the encoder in transform
. I can’t recall if we’re handling this sensibly in OneHotEncoder yet…
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:15 (10 by maintainers)
Top Results From Across the Web
Handling nan values with KBinsDiscretizer - python
I'm trying to fit KBinsDiscretizer on it. I want to bin the non null values normally and mark null values as separate bin....
Read more >sklearn.preprocessing.KBinsDiscretizer
Parameters: n_binsint or array-like of shape (n_features,), default=5. The number of bins to produce. Raises ValueError if n_bins < 2 .
Read more >sklearn.preprocessing.KBinsDiscretizer.fit_transform
KBinsDiscretizer.fit_transform taken from open source projects. ... one where the nans are replaced by max(feature) + 1 # A split where nans go...
Read more >How best to deal with missing event data - Cross Validated
I thought I might use KBinsDiscretizer to bin the data into one-hot ... My next idea is to replace NaNs with the max...
Read more >Preprocessing with sklearn: a complete and ...
To give our code some meaning, we'll create a very small data set with ... axis: 0 for rows, 1 for columns; tresh:...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks a lot for pointing out #12045 (I didn’t noticed it). It will be useful to handle missing value when
encode='ordinal'
.I will take a look at #11996 to see if I can take over the work on
OneHotEncoder
(to handle missing values whenencode in ['onehot', 'onehot-dense']
).Hi all,
I would like to help on this. I think, a good strategy would be to set the NaN-category to -1 in the ordinal encoding which then propagates naturally to the onehot-encoding.
What do you think about this?