AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True
See original GitHub issue- PyTorch-Forecasting version: 0.5.2
- PyTorch version: 1.6.0
- PyTorch lightning version: 1.0.4
- Python version: 3.7
- Operating System: Ubuntu 18.04.2
Expected behavior
My data looks something like this. As you can see, there is no data for some dates. So what I did was to find unique dates and sort them. Then I just find the time index based on the date’s position in the sorted date list.
sorted_dates = sorted(list(df.index.unique()))
DATE_TO_INDEX = {i: date for date, i in enumerate(sorted_dates)}
df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))
And this is what I got afterward When I executed code
# categories have to be strings
df['month'] = df.index.month.astype(str).astype('category')
df['weekday'] = df.weekday.astype(str).astype('category')
df['sentiment_binary'] = df.sentiment_binary.astype(str).astype('category')
df['Instrument'] = df.Instrument.astype(str).astype('category')
df['log_close'] = np.log(df.Close + 1e-8)
# Add holidays
us_holidays = holidays.UnitedStates()
df['holiday'] = df.index.map(lambda date: us_holidays[date] if date in us_holidays else '-').astype('category')
#sentiment per instrument for each time index
df['avg_close_by_instrument'] = df.groupby(['time_idx', 'Instrument'], observed=True).Close.transform('mean')
df.reset_index(inplace=True)
df.rename(columns={'index': 'published'}, inplace=True)
train_percentage = 0.1
train_size = int((1 - train_percentage) * len(df))
train = df.iloc[0:train_size]
test = df.iloc[train_size:]
max_prediction_length = test['time_idx'].min()
max_encoder_length = 24
training_cutoff = df["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
df[lambda x: x.time_idx <= training_cutoff],
time_idx='time_idx',
target='Close',
group_ids=['Instrument'],
)
I got this error
------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-216-a26698f913ea> in <module>
7 time_idx='time_idx',
8 target='Close',
----> 9 group_ids=['Instrument'],
10 )
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
282
283 # create index
--> 284 self.index = self._construct_index(data, predict_mode=predict_mode)
285
286 # convert to torch tensor for high performance data loading later
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
718 assert (
719 self.allow_missings
--> 720 ), "Time difference between steps has been idenfied as larger than 1 - set allow_missings=True"
721
722 df_index["index_end"], missing_sequences = _find_end_indices(
AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True
I have also tried to extract time_idx based on the difference between the date and the min date
def map_date_index(date, min_date):
return (date - min_date).days
min_date = df.index.min()
df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))
but I encountered the same error. I wonder what could be the reason for this?
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
python - Pytorch forecasting - Assertion Error when trying to ...
After training N-BEATS model, I want to predict with an out of sample dataframe with the length of max_encoder_lentgh as an input, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@khuyentran1401 sorry to bother you, but can you please share yor results if it’s possible? I mean maybe the result have been published on somewhere like Medium/own blog. Or maybe you’ve got access to result to this source (https://datapane.com/u/sygnals/reports/E7PqEqk/copper-x-g-b-classifier-model1-2020-09-24/)?
If my data doesn’t have the feature that could be used for
group_ids
, I wonder if there is any way to work around this?