Feature weighting for decision trees and random forests
See original GitHub issueBasic idea: Allow decision trees and random forests (and possibly other ensemble algorithms) to have feature weightings that make the selection of specific features as split points more or less likely.
I was looking to see if anything similar existed and came across the above paper. If I’m understanding the method correctly, the idea is to use random feature weights to create highly diverse ensembles that outperform other common methods, particularly where noise is present.
A general feature_weights
parameter could allow for tuning based on domain knowledge or other methods that might vastly improve the ability of tree or forest to generalize. Random forests could accept a set of weights to be used on all forests or an array of weights, one for each tree estimator.
In full disclosure, I’m really just getting my feet wet in terms of decision trees, random forests, etc. Feel free to let me know if this is wrong or does not make sense.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
@NicolasHug @jhogg11 Here is one example in Anomaly Detection: Robust Random Cut Forest Based Anomaly Detection On Streams (https://proceedings.mlr.press/v48/guha16.pdf):
Unlike Isolation Forest that randomly selects one attribute at each internal node, robust cut tree in this paper selects one attribute with the probability proportional to the input range on that attribute (corresponds to
feature_weights
here).Would this be as simple as adding an additional
feature_weights
parameter and applying the weights tocurrent_proxy_improvement
in the splitternode_split
method, e.g., something like this?