question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KBinsDiscretizer doesn't handle NAs

See original GitHub issue

If I try to pass some input to KBinsDiscretizer that would contain NaNs, it will throw an error instead of ignoring the NAs:

import numpy as np
from sklearn.preprocessing import KBinsDiscretizer

X = np.arange(10).reshape((-1,1))
X[2] = np.nan
kb = KBinsDiscretizer(encode="ordinal", strategy="quantile")
kb.fit(X)
ValueError: cannot convert float NaN to integer

The bins could still be calculated if one ignores the NAs and outputs them as np.nan in the transformation.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
david-cortescommented, Apr 19, 2021

Perhaps there could be an option about whether to treat NAs as a separate bin or to ignore them (i.e. outputting NaN for ordinal, and all-zeros for one-hot).

0reactions
glemaitrecommented, Apr 19, 2021

I think that our OneHotEncoder and OrdinalEncoder are letting pass the NaN nowadays (I am at least sure regarding the former).

On Mon, 19 Apr 2021 at 19:08, david-cortes @.***> wrote:

Perhaps there could be an option about whether to treat NAs as a separate bin or to ignore them (i.e. outputting NaN for ordinal, and all-zeros for one-hot).

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/19920#issuecomment-822632024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABY32P4IMMFW3JBMYC3OXF3TJRPSTANCNFSM43EUKAAQ .

– Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling nan values with KBinsDiscretizer - python
I'm trying to fit KBinsDiscretizer on it. I want to bin the non null values normally and mark null values as separate bin....
Read more >
sklearn.preprocessing.KBinsDiscretizer
Transform discretized data back to original feature space. Note that this function does not regenerate the original data due to discretization rounding.
Read more >
preprocess.py · fhlsyol/pycaret
Imputes all type of data (numerical,categorical & Time). Highly recommended to run Define_dataTypes class first. Numerical values can be imputed with mean or ......
Read more >
feature_engine Documentation - Feature-engine
With fit(), the transformer does not learn any parameter. ... Finally, some Feature-engine's encoders can handle multi-class targets ...
Read more >
A Tensor Compiler for Unified Machine Learning Prediction ...
does not imply adequate performance of the resulting DAGs. ... StandardScaler, Binarizer, KBinsDiscretizer, Normalizer, Poly-. nomialFeatures, OneHotEncoder ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found