Add permutation based feature importance?
See original GitHub issueI think adding permutation based feature importances would be cool: https://link.springer.com/article/10.1186%2F1471-2105-8-25
There is a python package that does it for our random forests with bagging: https://github.com/parrt/random-forest-importances
But I’d rather like to see a generic permutation based importance score with cross-validation or hold-out. I think this would be great analysis tool.
One easy way to implement it would be to provide a function for plotting etc. But Ideally we’d be able to use it in feature selection, I think, so we’d need to create a meta-estimator PermutationImportanceCV
that only provides feature_importances_
, I guess, so it can be wrapped with SelectFromModel
?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:15
- Comments:28 (17 by maintainers)
Top Results From Across the Web
4.2. Permutation feature importance - Scikit-learn
The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled [1]....
Read more >8.5 Permutation Feature Importance | Interpretable Machine ...
The permutation feature importance depends on shuffling the feature, which adds randomness to the measurement. When the permutation is repeated, the results ...
Read more >Permutation Feature Importance for ML Interpretability
Permutation feature importance is a valuable tool to have in your toolbox for analyzing black box models and providing ML interpretability. With ...
Read more >Permutation Importance - Kaggle
Permutation importance is calculated after a model has been fitted. So we won't change the model or change what predictions we'd get for...
Read more >Permutation Feature Importance - Azure - Microsoft Learn
Permutation Feature Importance works by randomly changing the values of each feature column, one column at a time. It then evaluates the model....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Also see this blog post: http://parrt.cs.usfca.edu/doc/rf-importance/index.html
I’m not suggesting we change our RF feature importances, but having a more expensive but possibly higher quality alternative would be great.
I am waiting for this method to be included. We need reliable results more than anything even if it’s not fast. We prefer not be wrong anytime. I implemented it myself for my model.(for a competition)