feature_importances_ should be a method in the ideal design
See original GitHub issueThis issue is not meant to be very practical, just a place to share my thoughts.
I believe feature_importances_
should have been designed as get_feature_importances()
(which is, perhaps, funny because I think the get_feature_names
design is pretty broken too), for the following reasons:
- calculating feature importances can be costly, and should not (and is not in some cases) be calculated at
fit
time unnecessary - there are often multiple ways to calculate feature importances (as simply as choice of norm for
coef_
), and (as long as they depend on the same sufficient statistics) the user may fairly not decide which is appropriate until afterfit
. Thusget_feature_importances
could have parameters to choose its method. Meta-estimators such asSelectFromModel
andRFE
currently have parameters for how they should interpretcoef_
as feature importances, but really these are parameters that should be passed to the linear model’sget_feature_importances
; the model itself should know how to summarise itscoef_
, and doing so gets more complicated once we have multi-outputcoef_
. - it is semantically different from other attributes, not being a sufficient statistic upon which basis the estimator makes predictions
I don’t think there is currently sufficient motivation to change, but I could be persuaded.
Ping @kmike?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:43 (41 by maintainers)
Top Results From Across the Web
How to Calculate Feature Importance With Python
Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable....
Read more >Understanding Feature Importance and How to Implement it in ...
Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent...
Read more >Feature importance — what's in a name? | by Sven Stringer
A model-agnostic approach is permutation feature importance. The idea is simple: after evaluating the performance of your model, you permute ...
Read more >Feature Importance & Random Forest - Python - Data Analytics
Because it can help us to understand which features are most important to our model and which ones we can safely ignore. This,...
Read more >Feature Importance - Codecademy
Feature importance is often used for dimensionality reduction. We can use it as a filter method to remove irrelevant features from our model...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
#12326 is another example of needing to configure a norm for the
coef_
, where that configuration needs to be passed through a metaestimator (as in RFE and SelectFromModel also).More Generic SelectFromModel API Proposal
We can extend the
SelectFromModel
API to have afeature_importance
parameter that can accept a callable:The default value for
feature_importance
will be'auto'
to keep the current behavior.Thoughts on permutation importance
Permutation Idea 1
Now for permutation importance, it would be extremely nice to have
feature_importance='permutation'
and have it magically work. The permutation importance needs the data, which means it can not supportprefit=True
and the importances must be calculated duringfit
. Furthermore, permutation importance accepts a scoring parameter. This means theSelectFromModel
api may look like this:Permutation Idea 2
We can go the other way and have users right a custom function for permutation importance:
The fitted estimator,
X
, andY
will get passed toperm_importance
duringfit
.prefit=True
will not be supported