question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature weighting for decision trees and random forests

See original GitHub issue

Basic idea: Allow decision trees and random forests (and possibly other ensemble algorithms) to have feature weightings that make the selection of specific features as split points more or less likely.

https://www.researchgate.net/publication/220338672_Random_feature_weights_for_decision_tree_ensemble_construction

I was looking to see if anything similar existed and came across the above paper. If I’m understanding the method correctly, the idea is to use random feature weights to create highly diverse ensembles that outperform other common methods, particularly where noise is present.

A general feature_weights parameter could allow for tuning based on domain knowledge or other methods that might vastly improve the ability of tree or forest to generalize. Random forests could accept a set of weights to be used on all forests or an array of weights, one for each tree estimator.

In full disclosure, I’m really just getting my feet wet in terms of decision trees, random forests, etc. Feel free to let me know if this is wrong or does not make sense.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
xuyxucommented, May 22, 2020

@NicolasHug @jhogg11 Here is one example in Anomaly Detection: Robust Random Cut Forest Based Anomaly Detection On Streams (https://proceedings.mlr.press/v48/guha16.pdf):

Unlike Isolation Forest that randomly selects one attribute at each internal node, robust cut tree in this paper selects one attribute with the probability proportional to the input range on that attribute (corresponds to feature_weights here).

0reactions
jhogg11commented, May 30, 2020

Can this be implemented as an extension Splitter to the current code base? I’m not sure how easily we should justify adding it to the primary implementation, as every piece of added complexity comes with substantial maintenance cost.

Would this be as simple as adding an additional feature_weights parameter and applying the weights to current_proxy_improvement in the splitter node_split method, e.g., something like this?

current_proxy_improvement = self.criterion.proxy_impurity_improvement()
current_proxy_improvement *= self.feature_weights[current.feature]      # new line

if current_proxy_improvement > best_proxy_improvement:
   ....
Read more comments on GitHub >

github_iconTop Results From Across the Web

Feature Weighting Random Forest for Detection of Hidden ...
2. We extended the random forest algorithm with a weighted feature selection method to select a subset of features for each decision tree....
Read more >
Variable importance-weighted Random Forests - PMC - NCBI
We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees ...
Read more >
Random feature weights for decision tree ensemble construction
"Random Feature Weights (RFW) for decision tree ensemble construction" [14] assigns random weight within the range of 0.0 to 1.0 to each ...
Read more >
The Mathematics of Decision Trees, Random Forest and ...
Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be...
Read more >
Random feature weights for decision tree ensemble construction
Random feature weights can also be used to describe the Random Forest [9] method. In this method, when selecting the attribute for a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found