Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True

See original GitHub issue

PyTorch-Forecasting version: 0.5.2
PyTorch version: 1.6.0
PyTorch lightning version: 1.0.4
Python version: 3.7
Operating System: Ubuntu 18.04.2

Expected behavior

My data looks something like this. As you can see, there is no data for some dates. So what I did was to find unique dates and sort them. Then I just find the time index based on the date’s position in the sorted date list.

sorted_dates = sorted(list(df.index.unique()))

DATE_TO_INDEX = {i: date for date, i in enumerate(sorted_dates)}

df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))

And this is what I got afterward When I executed code

# categories have to be strings
df['month'] = df.index.month.astype(str).astype('category')
df['weekday'] = df.weekday.astype(str).astype('category')
df['sentiment_binary'] = df.sentiment_binary.astype(str).astype('category')
df['Instrument'] = df.Instrument.astype(str).astype('category')
df['log_close'] = np.log(df.Close + 1e-8)

# Add holidays
us_holidays = holidays.UnitedStates()
df['holiday'] = df.index.map(lambda date: us_holidays[date] if date in us_holidays else '-').astype('category')

#sentiment per instrument for each time index
df['avg_close_by_instrument'] = df.groupby(['time_idx', 'Instrument'], observed=True).Close.transform('mean')

df.reset_index(inplace=True)
df.rename(columns={'index': 'published'}, inplace=True)

train_percentage = 0.1
train_size = int((1 - train_percentage) * len(df))

train = df.iloc[0:train_size]
test = df.iloc[train_size:] 

max_prediction_length = test['time_idx'].min() 
max_encoder_length = 24
training_cutoff = df["time_idx"].max() - max_prediction_length

training = TimeSeriesDataSet(
    df[lambda x: x.time_idx <= training_cutoff],
    time_idx='time_idx', 
    target='Close',
    group_ids=['Instrument'],
   )

I got this error

------------------------------------------------------------------------
AssertionError                         Traceback (most recent call last)
<ipython-input-216-a26698f913ea> in <module>
      7     time_idx='time_idx',
      8     target='Close',
----> 9     group_ids=['Instrument'],
     10    )

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
    282 
    283         # create index
--> 284         self.index = self._construct_index(data, predict_mode=predict_mode)
    285 
    286         # convert to torch tensor for high performance data loading later

~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
    718             assert (
    719                 self.allow_missings
--> 720             ), "Time difference between steps has been idenfied as larger than 1 - set allow_missings=True"
    721 
    722         df_index["index_end"], missing_sequences = _find_end_indices(

AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True

I have also tried to extract time_idx based on the difference between the date and the min date

def map_date_index(date, min_date):
    return (date - min_date).days
min_date = df.index.min()

df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))

but I encountered the same error. I wonder what could be the reason for this?

Issue Analytics

State:
Created 3 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

eliasszcommented, Jan 11, 2022

@khuyentran1401 sorry to bother you, but can you please share yor results if it’s possible? I mean maybe the result have been published on somewhere like Medium/own blog. Or maybe you’ve got access to result to this source (https://datapane.com/u/sygnals/reports/E7PqEqk/copper-x-g-b-classifier-model1-2020-09-24/)?

1reaction

khuyentran1401commented, Nov 2, 2020

If my data doesn’t have the feature that could be used for group_ids, I wonder if there is any way to work around this?