[BUG] sktime time series classification panel data have duplicate instance indices
See original GitHub issueDescribe the bug If you load arrow_head data with default split settings, the resulting dataframe has indices that repeat. this is because there is a concatenation of train and test data
To Reproduce
from sktime.datasets import load_arrow_head
X, y = load_arrow_head(return_X_y=True)
X.index.values
Output:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132,
133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158,
159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
172, 173, 174])
Expected behavior No repetition of indices
Versions
System:
python: 3.7.9 (default, Aug 31 2020, 07:22:35) [Clang 10.0.0 ]
executable: /usr/local/Caskroom/miniconda/base/envs/sktime/bin/python
machine: Darwin-20.5.0-x86_64-i386-64bit
Python dependencies: pip: 20.3.3 setuptools: 51.0.0.post20201207 sklearn: 0.23.0 sktime: 0.7.0 statsmodels: 0.12.1 numpy: 1.19.4 scipy: 1.5.4 Cython: 0.29.17 pandas: 1.1.5 matplotlib: 3.3.3 joblib: 1.0.0 numba: 0.52.0 pmdarima: 1.8.0 tsfresh: 0.17.0
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7
Top Results From Across the Web
Forecasting with sktime
In forecasting, past data is used to make temporal forward predictions of a time series. This is notably different from tabular prediction tasks...
Read more >In-memory data representations and data loading - sktime
The panel contains three multivariate series, with instance indices 0, 1, 2. All series have two variables (unnamed). All series are observed at...
Read more >Changelog — sktime documentation
Index subtype, from index element. Time series classification#. The base class of ProbabilityThresholdEarlyClassifier will be changed to BaseEarlyClassifier ...
Read more >Changelog — sktime documentation
if Panel is passed to a series transformer, it is applied to all instances. all transformers now have signature transform(X, ...
Read more >ProbabilityThresholdEarlyClassifi...
An sktime estimator to be built using the transformed data. Defaults to a CanonicalIntervalForest. classification_pointsList or None, default=None. List of ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Fix: https://github.com/alan-turing-institute/sktime/pull/3029
This is having too many ripple effects, I think we just need to enforce unique instance indices.
review, anyone? 😁