question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: How to make sure testing data is not used for prediction when using future covariates?

See original GitHub issue

Hi, I want to make sure I understand everything correctly and that I’m not accidentally using my testing data to make predictions. I read the documentation but did not find an example that leaves no doubt.

I have the following weekly series:

entire_series: 2018-01-01 - 2020-12-31 training_series: 2018-01-01 - 2019-12-31 testing_series: 2020-01-01 - 2020-12-31

I split those series in target series (including timestamp and target value) and covariate series (including the covariates). The covariates are known in advance.

rnn_model = RNNModel(input_chunk_length=52, training_length=80, n_rnn_layers=2)

rnn_model.fit(series=target_series_train, future_covariates=covariates_series_train, epochs=100)

When I make predictions, I want to make sure I am not using the target of my test set (i.e. make all the predictions for 2020 at the first day of the year) but I use all the covariates of the test set for 2020.

predictions = rnn_model.predict(n=52, future_covariates=covariates_series)

Any confirmation or clarificiation is highly appreciated.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
dennisbadercommented, Nov 25, 2021

Hi @pabuta88 and thanks for writing.

The predictions will start one time step after the end of your target_series_train.

You can check the start time and end time of a TimeSeries with series.start_time(), series.end_time().

So if your target_series_train ends in the last week of 2019, the prediction will start in the first week of 2020.

According to your input_chunk_length, the lookback window at prediction is 52 time steps and the forecast horizon is n=52.

  • Data used from target train series: the last 52 time steps
  • Data used from covariates: the values at the same 52 past time steps from previous point + the next n=52 time steps.

So your future covariates at prediction time must include these 52+52 time steps. Btw: you can use the entire covariate_series (without splitting into train and test series) for both training and prediction. The slicing of relevant covariates is done internally by the models.

0reactions
hrzncommented, Sep 11, 2022

@buddih09 Usually when talking about RNN’s or nn’s in general, a higher training length will result in better results but can also result in overfitting leaving the model useless. But, the same can be said about underfitting if the training length isn’t long enough for the model to capture a meaningful relationship. In my personal projects my training length will vary depending on use case, but generally I like to stick around 60% training data, and 40% test data; this allows models to determine a relationship while still leaving a big portion to test on and ascertain results. I know this didn’t exactly answer your question, but I hope it helps.

I think there’s a misunderstanding here. The training_length here refers to a number of time steps of a time series, not to a train/test split. @buddih09 With RNNs, input_chunk_length is the number of time steps fed into the network before it emits forecasts of the target. Then train_length (which must be larger that input_chunk_length) determines the total number of steps the RNN module is trained on. This is a hyper-parameter that would need to be tuned in many cases.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mixed past and future covariates · Issue #387 · unit8co/darts
Many problems have a mix of covariate time series which are known and unknown for the future. ... I am not sure how...
Read more >
4. Regression and Prediction - Practical Statistics for Data ...
Normally, you would use a majority of the data to fit the model, and use a smaller portion to test the model. This...
Read more >
Get a Grip! When to Add Covariates in a Linear Regression
To decide whether or not a covariate should be added to a regression in a prediction context, simply separate your data into a...
Read more >
Making Predictions with Regression Analysis
Learn how to use regression analysis to make predictions and determine whether they are both unbiased and precise.
Read more >
Time Series Forecasting Using Past and Future External ...
We'll be using synthetic time series data (created with Darts as well) to demonstrate how past and future covariates can be used. What...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found