KBinsDiscretizer doesn't handle NAs
See original GitHub issueIf I try to pass some input to KBinsDiscretizer
that would contain NaNs, it will throw an error instead of ignoring the NAs:
import numpy as np
from sklearn.preprocessing import KBinsDiscretizer
X = np.arange(10).reshape((-1,1))
X[2] = np.nan
kb = KBinsDiscretizer(encode="ordinal", strategy="quantile")
kb.fit(X)
ValueError: cannot convert float NaN to integer
The bins could still be calculated if one ignores the NAs and outputs them as np.nan
in the transformation.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Handling nan values with KBinsDiscretizer - python
I'm trying to fit KBinsDiscretizer on it. I want to bin the non null values normally and mark null values as separate bin....
Read more >sklearn.preprocessing.KBinsDiscretizer
Transform discretized data back to original feature space. Note that this function does not regenerate the original data due to discretization rounding.
Read more >preprocess.py · fhlsyol/pycaret
Imputes all type of data (numerical,categorical & Time). Highly recommended to run Define_dataTypes class first. Numerical values can be imputed with mean or ......
Read more >feature_engine Documentation - Feature-engine
With fit(), the transformer does not learn any parameter. ... Finally, some Feature-engine's encoders can handle multi-class targets ...
Read more >A Tensor Compiler for Unified Machine Learning Prediction ...
does not imply adequate performance of the resulting DAGs. ... StandardScaler, Binarizer, KBinsDiscretizer, Normalizer, Poly-. nomialFeatures, OneHotEncoder ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Perhaps there could be an option about whether to treat NAs as a separate bin or to ignore them (i.e. outputting NaN for ordinal, and all-zeros for one-hot).
I think that our OneHotEncoder and OrdinalEncoder are letting pass the NaN nowadays (I am at least sure regarding the former).
On Mon, 19 Apr 2021 at 19:08, david-cortes @.***> wrote:
– Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/