question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature_importances_ should be a method in the ideal design

See original GitHub issue

This issue is not meant to be very practical, just a place to share my thoughts.

I believe feature_importances_ should have been designed as get_feature_importances() (which is, perhaps, funny because I think the get_feature_names design is pretty broken too), for the following reasons:

  • calculating feature importances can be costly, and should not (and is not in some cases) be calculated at fit time unnecessary
  • there are often multiple ways to calculate feature importances (as simply as choice of norm for coef_), and (as long as they depend on the same sufficient statistics) the user may fairly not decide which is appropriate until after fit. Thus get_feature_importances could have parameters to choose its method. Meta-estimators such as SelectFromModel and RFE currently have parameters for how they should interpret coef_ as feature importances, but really these are parameters that should be passed to the linear model’s get_feature_importances; the model itself should know how to summarise its coef_, and doing so gets more complicated once we have multi-output coef_.
  • it is semantically different from other attributes, not being a sufficient statistic upon which basis the estimator makes predictions

I don’t think there is currently sufficient motivation to change, but I could be persuaded.

Ping @kmike?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:4
  • Comments:43 (41 by maintainers)

github_iconTop GitHub Comments

2reactions
jnothmancommented, Oct 10, 2018

#12326 is another example of needing to configure a norm for the coef_, where that configuration needs to be passed through a metaestimator (as in RFE and SelectFromModel also).

1reaction
thomasjpfancommented, Jun 20, 2019

More Generic SelectFromModel API Proposal

We can extend the SelectFromModel API to have a feature_importance parameter that can accept a callable:

def pca_importances(estimator, **kwargs):
    return np.abs(estimator.components_.ravel())

sfm = SelectFromModel(PCA(n_components=1), importance=pca_importances)

The default value for feature_importance will be 'auto' to keep the current behavior.

Thoughts on permutation importance

Permutation Idea 1

Now for permutation importance, it would be extremely nice to have feature_importance='permutation' and have it magically work. The permutation importance needs the data, which means it can not support prefit=True and the importances must be calculated during fit. Furthermore, permutation importance accepts a scoring parameter. This means the SelectFromModel api may look like this:

sfm = SelectFromModel(MLPClassifer(), 
                     importance='permutation',
					 scoring='auc', # only used when importance='permutation'
					 n_jobs=4, # only used when importance='permutation'
                     n_repeats=10,
)

Permutation Idea 2

We can go the other way and have users right a custom function for permutation importance:

def perm_importance(estimator, X, y):
    return permutation_importance(estimator, X, y, scoring='auc', n_jobs=4
								  n_repeats=10)['mean']

sfm = SelectFromModel(MLPClassifer(), 
                     importance=perm_importance,
                     pass_data=True)

The fitted estimator, X, and Y will get passed to perm_importance during fit. prefit=True will not be supported

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Calculate Feature Importance With Python
Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable....
Read more >
Understanding Feature Importance and How to Implement it in ...
Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent...
Read more >
Feature importance — what's in a name? | by Sven Stringer
A model-agnostic approach is permutation feature importance. The idea is simple: after evaluating the performance of your model, you permute ...
Read more >
Feature Importance & Random Forest - Python - Data Analytics
Because it can help us to understand which features are most important to our model and which ones we can safely ignore. This,...
Read more >
Feature Importance - Codecademy
Feature importance is often used for dimensionality reduction. We can use it as a filter method to remove irrelevant features from our model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found