Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

See original GitHub issue

PyTorch-Forecasting version:
PyTorch version: 1.8.1+rocm4.0.1
Python version: 3.8.8
Operating System: Ubuntu Groovy Gorilla

Expected behavior

I have data in a pandas dataframe that is characterized by the following:

Date          Load  Solar         Wind  time_idx month   grid
0 2019-05-01 00:00:00  21283.786430    0.0  2414.134105         0     5  ERCOT
1 2019-05-01 01:00:00  20502.851939    0.0  3097.408232         0     5  ERCOT
2 2019-05-01 02:00:00  19936.040922    0.0  2774.369595         0     5  ERCOT
3 2019-05-01 03:00:00  19905.404774    0.0  2889.462763         0     5  ERCOT
4 2019-05-01 04:00:00  20343.393833    0.0  2719.568595         0     5  ERCOT
               Load         Solar          Wind      time_idx
count  10991.000000  10991.000000  10991.000000  10991.000000
mean   25019.883604   3556.769244   1968.367736      6.990993
std     4809.103704   4194.475591   1298.381739      4.327012
min    15591.377319      0.000000      0.000000      0.000000
25%    21556.054368      0.000000    784.979067      3.000000
50%    24070.778150    513.663393   1883.507906      7.000000
75%    27143.606424   7780.020473   3005.799267     11.000000
max    44148.227377  11854.229384   5243.790771     14.000000

I then took the TimeSeriesDataSet from the Stallion example for the python temporal fusion transformer example and modified it to fit my data in the following way:

max_prediction_length = 6
max_encoder_length = 24
training_cutoff = data["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="Load",
    group_ids=["grid"],
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals=["grid"],
    # static_reals=["avg_population_2017", "avg_yearly_household_income_2017"],
    time_varying_known_categoricals=["month"],
    # variable_groups={"special_days": special_days},  # group of categorical variables can be treated as one variable
    time_varying_known_reals=["time_idx", "Solar", "Wind"],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[],
    target_normalizer=GroupNormalizer(
        groups=["grid"], transformation="softplus"
    ),  # use softplus and normalize by group
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
    allow_missings=True,
)

Actual behavior

Here is the error that I keep getting. I don’t have any lags in my time series data set and I don’t see how the encoders could be causing issues with there size.

Traceback (most recent call last):
  File "grid_Transformer.py", line 65, in <module>
    training = TimeSeriesDataSet(
  File "/home/prism/anaconda3/envs/transformer/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py", line 435, in __init__
    self.index = self._construct_index(data, predict_mode=predict_mode)
  File "/home/prism/anaconda3/envs/transformer/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py", line 1233, in _construct_index
    assert (
AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

I have attached the full code and the data file below. Thanks for any help or insights!

code_data.zip

A note on the full code is that when I include more data points I don’t get the error above but then it fills my ram and my computer kills it before the transformer can begin running.