question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing features removal with SimpleImputer

See original GitHub issue

Code sample

In the sample code below, a column is removed from the dataset during the pipeline

>>> from sklearn.impute import SimpleImputer
>>> import numpy as np
>>> imp = SimpleImputer()
>>> imp.fit([[0, np.nan], [1, np.nan]])
>>> imp.transform([[0, np.nan], [1, 1]])
array([[0.],
       [1.]])

Problem description

Currently sklearn.impute.SimpleImputer silently removes features that are np.nan on every training sample.

This may cause further issues on pipelines because the dataset’s shape has changed, e.g.

dataset[:, columns_to_impute_with_median] = imp.fit_transform(dataset[:, columns_to_impute_with_median])

Possible solutions

For the problematic features, either keep their values if valid or impute the fill_value during transform. I suggest adding a new parameter to trigger this behaviour with a warning highlighting the referred features.

As I’m willing to implement this feature, I look forward advices.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:6
  • Comments:20 (16 by maintainers)

github_iconTop GitHub Comments

4reactions
jnothmancommented, Apr 23, 2020

I don’t see how this leading to a user friendly API, and I don’t see why we can’t have an option to uphold both properties A and B by imputing empty columns with a constant value.

3reactions
jnothmancommented, Feb 12, 2020

I think if we solved the feature names issue this would be a much smaller problem…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does SimpleImputer remove features? - Stack Overflow
SimpleImputer.html: 'Columns which only contained missing values at fit are discarded upon transform if strategy is not “constant”'.
Read more >
sklearn.impute.SimpleImputer
Multivariate imputer that estimates missing features using nearest samples. Notes. Columns which only contained missing values at fit are discarded upon ...
Read more >
Imputing Missing Values using the SimpleImputer Class in ...
In this article, I will show you how to use the SimpleImputer class in sklearn to quickly and easily replace missing values in...
Read more >
ML | Handle Missing Data with Simple Imputer - GeeksforGeeks
SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values ......
Read more >
Retrieve dropped column names from `sklearn.impute ...
SimpleImputer drops columns consisting entirely of missing values. ... get_feature_names method to find out when/if a column was removed.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found