Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement Extended Isolation Forest

See original GitHub issue

Describe the workflow you want to enable

In the context of anomaly detection, the isolation forest algorithm has a bias making data points’ anomaly scores lower than what they actually should be. This problem arises for areas in the space which are axis-aligned with a cluster. Imagine a point very far from a cluster, the basic Isolation Forest algorithm may assign it a lower anomaly score only because the point is axis aligned with the cluster. This does lead to false negative in my application.

To overcome this bias, Hariri et al proposed the Extended Isolation Forest algorithm. While the normal Isolation Forests algorithm randomly chooses a feature and a threshold value to split the points, the extended version uses a random hyperplane to do so. Those random hyperplanes, as they might not be axis-aligned, remove the bias caused by the standard algorithm. In the end, the standard algorithm becomes a subset of the extended version one, but using only axis-aligned hyperplanes.

Please have a look to the original paper, it explains the problem very well.

Describe your proposed solution

I had a look to the Isolation Forest code, in my humble opinion, the simplest solution might be to add an argument to the IsolationForest class constructor to choose how the samples should be split into two. Maybe something like an extended argument.

This would basically modify the splitter argument passed to the underlying base_estimator (an ExtraTreeRegressor instance). We could add a “random_hyperplane” splitting mode, which requires implementing a new Splitter class.

Globally, this is not a lot of changes but adding the new splitter class. I can work on an implementation if we agree this is a useful addition.

Describe alternatives you’ve considered, if relevant

Maybe the Extended Isolation Forest algorithm should be a distinct class, but I doubt it is worth it.

Issue Analytics

State:
Created 4 years ago
Reactions:8
Comments:13 (6 by maintainers)

Top GitHub Comments

2reactions

zachmayercommented, Feb 6, 2022

@thomasjpfan just to be clear, you’d be open to a random hyperplane splitter, if it could be used for isolation forests, extra trees, and random forests?

Only splitting along an axis is a major limitation of tree-based algorithms, especially for anomaly detection. It’d be really nice to have this functionality in sklearn.

2reactions

funnellcommented, Jun 29, 2020

Considering the problematic bias, maybe it would be better to have the “extended” algorithm replace the original, unless you can conceive a situation where you’d still want the original. The extended algorithm looks consistently better in the paper.

Top Results From Across the Web

Extended Isolation Forest — H2O 3.38.0.3 documentation

The Extended Isolation Forest algorithm generalizes its predecessor algorithm, Isolation Forest. The original Isolation Forest algorithm brings a brand new form ...

sahandha/eif: Extended Isolation Forest for Anomaly Detection

Here, we present an extension to the model-free anomaly detection algorithm, Isolation Forest Liu2008. This extension, named Extended Isolation Forest (EIF), ...

Is there any implementation of Extended Isolation Forest ...

There is a package on Github called "Extended Isolation Forest for Anomaly Detection", I used it a couple months ago and it seemed...

Outlier Detection with Extended Isolation Forest

Isolation Forest algorithm utilizes the fact that anomalous observations are few and significantly different from 'normal' observations. The ...

Fraud Analytics using Extended Isolation Forest Algorithm

Isolation Forest is unsupervised learning technique used for anomaly detection. It uses the fact that only few observations are outliers and ...