question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SelectFromModel: max_features can't be greater than number of features

See original GitHub issue

Describe the bug

When I define a SelectFromModel instance like here:

SelectFromModel(RandomForestClassifier(), max_features=100)

and the number of total features is less than 100, then a ValueError is raised:

ValueError: 'max_features' should be 0 and 10 features.Got 100 instead.

I consider this a bug as for example in my pipeline this feature selector is preceded by other feature selectors like low VarianceThreshold. Point being it is never known how many features will be left when this point is reached. If the value is bigger than available features it should just keep all of them and not throw an error.

Steps/Code to Reproduce

from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier

x,y = make_classification(n_features=10, n_informative=8)
sfm = SelectFromModel(RandomForestClassifier(), max_features=100)
sfm.fit(x,y)

Expected Results

If the value is bigger than available features it should just keep all of them and not throw an error.

Actual Results

ValueError: 'max_features' should be 0 and 10 features.Got 100 instead.

Versions

0.23.2 0.24.1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
glemaitrecommented, Dec 17, 2021

What about accepting a callable that would take the X and should return an integer.

SelectFromModel(RandomForestClassifier(), max_features=lambda X: min(X.shape[0], 100))

I still think that adding support to a float could be nice because this is a common API in other estimators and one would expect to have it there as well.

0reactions
Micky774commented, Feb 1, 2022

Gave an initial implementation in PR #22356 if anyone is interested in taking a look!

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.feature_selection.SelectFromModel
If a callable, then it specifies how to calculate the maximum number of features allowed by using the output of max_features(X) .
Read more >
How does SelectFromModel from scikit-learn select features?
To only select based on max_features, set threshold=-np.inf. I found the above text in the documentation sklearn.feature_selection.
Read more >
SelectFromModel vs RFE - huge difference in model ...
In any case, selecting features based on their Gini importance (Mean Decrease in Impurity - MDI) has started falling out of fashion, mainly ......
Read more >
Sklearn SelectFromModel for Feature Importance
Transform the training data to the dataset consisting of features value whose importance is greater than the threshold value. Create the ...
Read more >
Comprehensive Guide on Feature Selection
Feature Selection is the process of selecting optimal number of features from a larger set of features. There are several advantages of this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found