Outlier detection
See original GitHub issueIs your feature request related to a problem? Please describe. I would like to detect outliers in time series data.
Describe the solution you’d like
I have seen the Time Series Annotation enhancement proposal, however we actually could just have classes which inherit from _SeriesToSeriesTransformer
and fill outliers with np.nan
values. Then the outlier correction could happen by means of the Imputer
I recently added.
I would like to add different outlier detection classes (maybe with a common parent class), which could be placed in sktime.transformations.series.outlier
.
Describe alternatives you’ve considered
Implementing the annotation module from scratch, because for probabilistic outlier detection we would need to return an additional column (so a pd.DataFrame
) with the probabilities for each time point. This could however be solved with above solution in having a threshold
argument which takes a probability value to decide if a point is an outlier or not.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8
I would propose to start simple and I implement in a first step a transformer (e.g. a
HampelFilter
). I actually planned to wrapadtk
or at least a part of it, however it seems the MPL-2.0 License is not compatible.Hm, the “obvious” way would be a graphical composition formalism like in
ADTK
ormlr
orMLJ
, but that’s an entire module and complex. Would be a great thing to have though (@aiwalter, any ambitions?).A simpler (but slightly more clunky and less principled) way would be
RemoverTrafo(forecaster, remove=myoutlierdetector)
?Where
RemoverTrafo.transform(y)
computesmyoutlierdetector.transform(y)
to get the timestamps that are to be removed; the output ofRemoverTrafo.transform(y)
isy
with rows corresponding to those timestamps removed.