Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature weighting for decision trees and random forests

See original GitHub issue

Basic idea: Allow decision trees and random forests (and possibly other ensemble algorithms) to have feature weightings that make the selection of specific features as split points more or less likely.

https://www.researchgate.net/publication/220338672_Random_feature_weights_for_decision_tree_ensemble_construction

I was looking to see if anything similar existed and came across the above paper. If I’m understanding the method correctly, the idea is to use random feature weights to create highly diverse ensembles that outperform other common methods, particularly where noise is present.

A general feature_weights parameter could allow for tuning based on domain knowledge or other methods that might vastly improve the ability of tree or forest to generalize. Random forests could accept a set of weights to be used on all forests or an array of weights, one for each tree estimator.

In full disclosure, I’m really just getting my feet wet in terms of decision trees, random forests, etc. Feel free to let me know if this is wrong or does not make sense.

Issue Analytics

State:
Created 3 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

xuyxucommented, May 22, 2020

@NicolasHug @jhogg11 Here is one example in Anomaly Detection: Robust Random Cut Forest Based Anomaly Detection On Streams (https://proceedings.mlr.press/v48/guha16.pdf):

Unlike Isolation Forest that randomly selects one attribute at each internal node, robust cut tree in this paper selects one attribute with the probability proportional to the input range on that attribute (corresponds to feature_weights here).

0reactions

jhogg11commented, May 30, 2020

Can this be implemented as an extension Splitter to the current code base? I’m not sure how easily we should justify adding it to the primary implementation, as every piece of added complexity comes with substantial maintenance cost.

Would this be as simple as adding an additional feature_weights parameter and applying the weights to current_proxy_improvement in the splitter node_split method, e.g., something like this?

current_proxy_improvement = self.criterion.proxy_impurity_improvement()
current_proxy_improvement *= self.feature_weights[current.feature]      # new line

if current_proxy_improvement > best_proxy_improvement:
   ....