question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement Extended Isolation Forest

See original GitHub issue

Describe the workflow you want to enable

In the context of anomaly detection, the isolation forest algorithm has a bias making data points’ anomaly scores lower than what they actually should be. This problem arises for areas in the space which are axis-aligned with a cluster. Imagine a point very far from a cluster, the basic Isolation Forest algorithm may assign it a lower anomaly score only because the point is axis aligned with the cluster. This does lead to false negative in my application.

To overcome this bias, Hariri et al proposed the Extended Isolation Forest algorithm. While the normal Isolation Forests algorithm randomly chooses a feature and a threshold value to split the points, the extended version uses a random hyperplane to do so. Those random hyperplanes, as they might not be axis-aligned, remove the bias caused by the standard algorithm. In the end, the standard algorithm becomes a subset of the extended version one, but using only axis-aligned hyperplanes.

Please have a look to the original paper, it explains the problem very well.

Describe your proposed solution

I had a look to the Isolation Forest code, in my humble opinion, the simplest solution might be to add an argument to the IsolationForest class constructor to choose how the samples should be split into two. Maybe something like an extended argument.

This would basically modify the splitter argument passed to the underlying base_estimator (an ExtraTreeRegressor instance). We could add a “random_hyperplane” splitting mode, which requires implementing a new Splitter class.

Globally, this is not a lot of changes but adding the new splitter class. I can work on an implementation if we agree this is a useful addition.

Describe alternatives you’ve considered, if relevant

Maybe the Extended Isolation Forest algorithm should be a distinct class, but I doubt it is worth it.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:8
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
zachmayercommented, Feb 6, 2022

@thomasjpfan just to be clear, you’d be open to a random hyperplane splitter, if it could be used for isolation forests, extra trees, and random forests?

Only splitting along an axis is a major limitation of tree-based algorithms, especially for anomaly detection. It’d be really nice to have this functionality in sklearn.

2reactions
funnellcommented, Jun 29, 2020

Considering the problematic bias, maybe it would be better to have the “extended” algorithm replace the original, unless you can conceive a situation where you’d still want the original. The extended algorithm looks consistently better in the paper.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Extended Isolation Forest — H2O 3.38.0.3 documentation
The Extended Isolation Forest algorithm generalizes its predecessor algorithm, Isolation Forest. The original Isolation Forest algorithm brings a brand new form ...
Read more >
sahandha/eif: Extended Isolation Forest for Anomaly Detection
Here, we present an extension to the model-free anomaly detection algorithm, Isolation Forest Liu2008. This extension, named Extended Isolation Forest (EIF), ...
Read more >
Is there any implementation of Extended Isolation Forest ...
There is a package on Github called "Extended Isolation Forest for Anomaly Detection", I used it a couple months ago and it seemed...
Read more >
Outlier Detection with Extended Isolation Forest
Isolation Forest algorithm utilizes the fact that anomalous observations are few and significantly different from 'normal' observations. The ...
Read more >
Fraud Analytics using Extended Isolation Forest Algorithm
Isolation Forest is unsupervised learning technique used for anomaly detection. It uses the fact that only few observations are outliers and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found