question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improvements for FeatureImportances visualizer

See original GitHub issue

There are a couple of enhancements for the yellowbrick.features.FeatureImportances visualizer that should be made to really make it stick out. They are as follows:

Note to contributors: items in the below checklist don’t need to be completed in a single PR; if you see one that catches your eye, feel to pick it off the list!

  • color negative coefs
  • top n features to filter number displayed (both pos and neg)
  • implement standard deviation for ensemble models

Color negative coefs

The first item is relatively straightforward, currently, the bar chart is a single color, but it might be nice to show negative coef_ values as a different color, e.g. blue for positive green for negative as below:

figure_1

To do this, you’ll have to create a color array to pass as the color argument, e.g.

colors = np.array(['b' if v > 0 else 'g' for v in self.feature_importances_]) 
self.ax.barh(pos, self.feature_importances_, color=colors, align='center')

We should also create arguments to provide a way to specify the colors.

Top N Features

For the second item, I’m picturing something similar to most informative features with scikit-learn (though not exactly this code). Here, an argument topn which defaults to None specifies a filter to only plot the N best features.

This should also be relatively straightforward, but gets complicated in the case of negative values. We have two options, we can rank all values including negative values and plot the N best values either positive or negative, or we can do the N best positive and N best negative coefficients

Standard Deviation for Ensembles

Ensemble models like Random Forest and Gradient Boosting have an underlying estimators_ attribute that describes each feature’s importance in a different way. The global feature importances are the mean, but it would be nice to add an xerr bar with the standard deviation as in plot forest importances.

This could also be useful for CV models that also have an underlying estimators_ attribute.

The idea with this one is to compute the standard deviations for each feature using estimators_ and np.std, and storing the value in a confidences_ attribute during fit. Note that it will also have to be sorted using the sort_index – the confidences are drawn during ax.barh with xerr=self.confidences_.

Right now it looks like the example above is no longer working exactly as expected, so some deeper review is necessary.

See also #194 where a discussion about tree-specific feature importances is ongoing.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
JonoCXcommented, Jun 20, 2020

Hi, I know that this is an old issue, but I can’t find an implementation of the top-n feature discussed here. Would a top-n parameter (or even a separate visualizer) be a useful feature? I use the library regularly, and it’s a feature that I recently needed for a paper.

0reactions
mgarodcommented, Oct 5, 2020

Hello yellowbrick! In the interest of legitimate hacktoberfest PRs I took at stab at the top_n feature for this

Please see PR #1102 for my very quick implementation of this feature. I’m open to suggestions on how to fully flesh this out. I did not try out many combinations of parameters, so there are scenarios where top_n may or may not even apply which I have not even considered.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Feature Importance and How to Implement it in ...
Feature Importance refers to techniques that calculate a score for all the input features for a given model — the scores simply represent...
Read more >
FeatureEnVi: Visual Analytics for Feature Engineering Using ...
Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to.
Read more >
Comparison of feature importance measures as explanations ...
In this study we compare different feature importance measures using both ... The explanations of what has to be improved to change an ......
Read more >
Visualizing the Feature Importance for Black Box Models
Based on local feature importance, we propose two visual tools: partial importance (PI) and individual conditional importance (ICI) plots which ...
Read more >
Feature Selection: Beyond feature importance? - KDnuggets
With the improvement, we didn't see any change in model accuracy, but we saw improvement in runtime. By removing, we were able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found