TimeSeriesPredictor: predictor.fit() KeyError: 0
See original GitHub issue-
I have checked that this bug exists on the latest stable version of AutoGluon
-
and/or I have checked that this bug exists on the latest mainline of AutoGluon via source installation
-
Description: When I attempt to follow the Timseries Forecasting quick start example with my own data, the example fails at
predictor.fit(...)
. Although I am using a different dataset, the shape is very similar to that of the example dataset. Timestamp column, Target column, and an Identifier Column. When I use the quickstart sample data it works fine. But there are nearly no differences in the shape of the data I have substituted. -
Expected Behavior: I would expect the predictor to begin training, however it fails here:
INFO:autogluon.timeseries.learner:Learner random seed set to 0
INFO:autogluon.timeseries.predictor:presets is set to low_quality
INFO:autogluon.timeseries.predictor:================ TimeSeriesPredictor ================
INFO:autogluon.timeseries.predictor:TimeSeriesPredictor.fit() called
INFO:autogluon.timeseries.predictor:Setting presets to: low_quality
INFO:autogluon.timeseries.predictor:Fitting with arguments:
INFO:autogluon.timeseries.predictor:{'evaluation_metric': 'MAPE',
'hyperparameter_tune_kwargs': None,
'hyperparameters': 'toy',
'prediction_length': 60,
'target_column': 'close',
'time_limit': None}
INFO:autogluon.timeseries.predictor:Provided training data set with 12060 rows, 5 items. Average time series length is 2412.0.
INFO:autogluon.timeseries.predictor:Training artifacts will be saved to: /content/autogluon-models
INFO:autogluon.timeseries.predictor:=====================================================
WARNING:autogluon.timeseries.predictor:Validation data is None, will hold the last prediction_length 60 time steps out to use as validation set.
INFO:autogluon.timeseries.learner:AutoGluon will save models to autogluon-models/
INFO:autogluon.timeseries.trainer:
Starting training. Start time is 2022-07-31 19:46:01
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
17 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py](https://localhost:8080/#) in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 0
Reproduce:
import yfinance as yf
from datetime import date, timedelta
df = yf.download("AAPL", start = (date.today() - timedelta(weeks=52)).strftime("%Y-%m-%d")
df['ticker'] = 'AAPL'
df = df[['Date', 'Close', 'ticker']]
If you were to replace the test data provided in the Quick start and use the dataframe above and continue to follow the steps in the quickstart, you would see this error.
INSTALLED VERSIONS
------------------
date : 2022-07-31
time : 20:07:22.078231
python : 3.7.13.final.0
OS : Linux
OS-release : 5.4.188+
Version : #1 SMP Sun Apr 24 10:03:06 PDT 2022
machine : x86_64
processor : x86_64
num_cores : 2
cpu_ram_mb : 12986
cuda version : 11.460.32.03
num_gpus : 1
gpu_ram_mb : [15106]
avail_disk_size_mb : 37670
autogluon.common : 0.5.2
autogluon.core : 0.5.2
autogluon.features : 0.5.2
autogluon.multimodal : 0.5.2
autogluon.tabular : 0.5.2
autogluon.text : 0.5.2
autogluon.timeseries : 0.5.2
autogluon.vision : 0.5.2
autogluon_contrib_nlp : 0.0.1
boto3 : 1.24.42
catboost : 1.0.6
dask : 2021.11.2
distributed : 2021.11.2
fairscale : 0.4.6
fastai : 2.7.7
gluoncv : 0.11.0
gluonts : 0.9.6
hyperopt : 0.2.7
lightgbm : 3.3.2
matplotlib : 3.2.2
networkx : 2.6.3
nlpaug : 1.1.10
nltk : 3.7
nptyping : 1.4.4
numpy : 1.21.6
omegaconf : 2.1.2
pandas : 1.3.5
PIL : 9.0.1
protobuf : None
psutil : 5.8.0
pytorch-metric-learning: None
pytorch_lightning : 1.6.5
ray : 1.13.0
requests : 2.28.1
scipy : 1.7.3
sentencepiece : None
skimage : 0.19.3
sklearn : 1.0.2
smart_open : 5.2.1
timm : 0.5.4
torch : 1.12.0+cu113
torchmetrics : 0.7.3
torchtext : 0.13.0
torchvision : 0.13.0+cu113
tqdm : 4.64.0
transformers : 4.20.1
xgboost : 1.4.2
/usr/local/lib/python3.7/dist-packages/gluoncv/__init__.py:40: UserWarning: Both `mxnet==1.9.1` and `torch==1.12.0+cu113` are installed. You might encounter increased GPU memory footprint if both framework are used at the same time.
warnings.warn(f'Both `mxnet=={mx.__version__}` and `torch=={torch.__version__}` are installed. '
Additional Context: Again, as mentioned I am doing everything per the quick start documentation. All cells work fine up until the predictor.fit( train_data=train_data, presets="low_quality", )
Issue Analytics
- State:
- Created a year ago
- Comments:5
Thanks for raising this issue @Alexwenner0 , a fix is already being worked on at #1993. This error is due to many finance time series having irregular timestamps, when the frequency of the time series cannot be inferred. You could replace the data columns with a dummy datetime index in the meanwhile:
👍 I would think that depends entirely on the use case. For example, in finance, I believe the irregular times would correspond to days when the market is open and therefore would perhaps make sense. Otherwise, it is of course up to you for your use case to ffill the data (probably not backfill as that could lead to information leaking from the future).
ignore_time_index=False
by default in AG-TS. Hope this helps.