question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IterativeImputer behaviour on missing nan's in fit data

See original GitHub issue

Why is this behaviour forced:

Features with missing values during transform which did not have any missing values during fit will be imputed with the initial imputation method only.

https://scikit-learn.org/dev/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer

This means by default it will return the mean of that feature. I would prefer just fit one iteration of the chosen estimator and use that fitted estimator to impute missing values.

Actual behaviour: Example - The second feature missing np.nan --> mean imputation

import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, 14]])

X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))
Return:
[[ 2.  4.]
 [ 6. 12.]
 [ 3.  6.]
 [ 4. 12.]
 [33. 12.]]

Example adjusted - Second feature has np.nan values --> iterative imputation with estimator

import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, np.nan]])

X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))
Return:
[[ 2.  4.]
 [ 6. 12.]
 [ 3.  6.]
 [ 4. 8.]
 [33. 66.]]

Maybe sklearn/impute.py line 679 to 683 should be optional with a parameter like force-iterimpute.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:21 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
jnothmancommented, Sep 9, 2019

Thanks Sergey!

1reaction
sergeyfcommented, Aug 25, 2019

Maybe I can take a crack at this.

To review: the change would be to (optionally and by default) to fit regressors on even those features that have no missing values at train time.

At transform, we can then impute them these features if they are missing for any sample.

We will need a new test, and to update the doc string, Maybe the test can come directly from https://github.com/scikit-learn/scikit-learn/issues/14383?

Am I missing anything?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Imputing missing values with variants of IterativeImputer
In this example we compare some estimators for the purpose of missing feature imputation with IterativeImputer : BayesianRidge : regularized linear regression.
Read more >
A Better Way to Handle Missing Values in your Dataset
Let's now use IterativeImputer to impute the missing values for these two features in our dataset containing NaNs. I'll be using RandomForestRegressor() as...
Read more >
Imputing missing values with variants of IterativeImputer
IterativeImputer to mimic the behavior of missForest, a popular imputation package for R. In this example, we have chosen to use sklearn.ensemble.
Read more >
Imputing missing values using sklearn IterativeImputer class ...
IterativeImputer behavior can change depending on a random state. The random state which can be set is also called a "seed".
Read more >
Iterative Imputation for Missing Values in Machine Learning
How to impute missing values with iterative models as a data preparation method when evaluating models and when fitting a final model to ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found