question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True

See original GitHub issue
  • PyTorch-Forecasting version: 0.5.2
  • PyTorch version: 1.6.0
  • PyTorch lightning version: 1.0.4
  • Python version: 3.7
  • Operating System: Ubuntu 18.04.2

Expected behavior

My data looks something like this. As you can see, there is no data for some dates. So what I did was to find unique dates and sort them. Then I just find the time index based on the date’s position in the sorted date list.

sorted_dates = sorted(list(df.index.unique()))

DATE_TO_INDEX = {i: date for date, i in enumerate(sorted_dates)}

df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))

And this is what I got afterward When I executed code

# categories have to be strings
df['month'] = df.index.month.astype(str).astype('category')
df['weekday'] = df.weekday.astype(str).astype('category')
df['sentiment_binary'] = df.sentiment_binary.astype(str).astype('category')
df['Instrument'] = df.Instrument.astype(str).astype('category')
df['log_close'] = np.log(df.Close + 1e-8)

# Add holidays
us_holidays = holidays.UnitedStates()
df['holiday'] = df.index.map(lambda date: us_holidays[date] if date in us_holidays else '-').astype('category')

#sentiment per instrument for each time index
df['avg_close_by_instrument'] = df.groupby(['time_idx', 'Instrument'], observed=True).Close.transform('mean')

df.reset_index(inplace=True)
df.rename(columns={'index': 'published'}, inplace=True)

train_percentage = 0.1
train_size = int((1 - train_percentage) * len(df))

train = df.iloc[0:train_size]
test = df.iloc[train_size:] 

max_prediction_length = test['time_idx'].min() 
max_encoder_length = 24
training_cutoff = df["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    df[lambda x: x.time_idx <= training_cutoff],
    time_idx='time_idx', 
    target='Close',
    group_ids=['Instrument'],
   )

I got this error

------------------------------------------------------------------------
AssertionError                         Traceback (most recent call last)
<ipython-input-216-a26698f913ea> in <module>
      7     time_idx='time_idx',
      8     target='Close',
----> 9     group_ids=['Instrument'],
     10    )

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
    282 
    283         # create index
--> 284         self.index = self._construct_index(data, predict_mode=predict_mode)
    285 
    286         # convert to torch tensor for high performance data loading later

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
    718             assert (
    719                 self.allow_missings
--> 720             ), "Time difference between steps has been idenfied as larger than 1 - set allow_missings=True"
    721 
    722         df_index["index_end"], missing_sequences = _find_end_indices(

AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True


I have also tried to extract time_idx based on the difference between the date and the min date

def map_date_index(date, min_date):
    return (date - min_date).days
min_date = df.index.min()

df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))

but I encountered the same error. I wonder what could be the reason for this?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
eliasszcommented, Jan 11, 2022

@khuyentran1401 sorry to bother you, but can you please share yor results if it’s possible? I mean maybe the result have been published on somewhere like Medium/own blog. Or maybe you’ve got access to result to this source (https://datapane.com/u/sygnals/reports/E7PqEqk/copper-x-g-b-classifier-model1-2020-09-24/)?

1reaction
khuyentran1401commented, Nov 2, 2020

If my data doesn’t have the feature that could be used for group_ids, I wonder if there is any way to work around this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Pytorch forecasting - Assertion Error when trying to ...
After training N-BEATS model, I want to predict with an out of sample dataframe with the length of max_encoder_lentgh as an input, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found