Question: How to make sure testing data is not used for prediction when using future covariates?
See original GitHub issueHi, I want to make sure I understand everything correctly and that I’m not accidentally using my testing data to make predictions. I read the documentation but did not find an example that leaves no doubt.
I have the following weekly series:
entire_series: 2018-01-01 - 2020-12-31 training_series: 2018-01-01 - 2019-12-31 testing_series: 2020-01-01 - 2020-12-31
I split those series in target series (including timestamp and target value) and covariate series (including the covariates). The covariates are known in advance.
rnn_model = RNNModel(input_chunk_length=52, training_length=80, n_rnn_layers=2)
rnn_model.fit(series=target_series_train, future_covariates=covariates_series_train, epochs=100)
When I make predictions, I want to make sure I am not using the target of my test set (i.e. make all the predictions for 2020 at the first day of the year) but I use all the covariates of the test set for 2020.
predictions = rnn_model.predict(n=52, future_covariates=covariates_series)
Any confirmation or clarificiation is highly appreciated.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Hi @pabuta88 and thanks for writing.
The predictions will start one time step after the end of your
target_series_train
.You can check the start time and end time of a TimeSeries with
series.start_time()
,series.end_time()
.So if your
target_series_train
ends in the last week of 2019, the prediction will start in the first week of 2020.According to your
input_chunk_length
, the lookback window at prediction is 52 time steps and the forecast horizon is n=52.So your future covariates at prediction time must include these 52+52 time steps. Btw: you can use the entire
covariate_series
(without splitting into train and test series) for both training and prediction. The slicing of relevant covariates is done internally by the models.I think there’s a misunderstanding here. The
training_length
here refers to a number of time steps of a time series, not to a train/test split. @buddih09 With RNNs,input_chunk_length
is the number of time steps fed into the network before it emits forecasts of the target. Thentrain_length
(which must be larger thatinput_chunk_length
) determines the total number of steps the RNN module is trained on. This is a hyper-parameter that would need to be tuned in many cases.