Natural gaps in timeseries observations
See original GitHub issueI have hourly energy observations taken during business days only. So 120 hrs per week, not 168. With Sat and Sun missing always and holidays as well. Seasonality is daily, weekly, yearly.
Was trying to follow samples and use TimeSeries.from_dataframe with default settings. I got a lot of NaNs inserted into DateTimeIndex, that matches pandas.asfreq('H')
behaviour. So with train/test split train, val = series.split_before(pd.Timestamp('20200101'))
I receive
len(data[:'20200101']), len(train), len(data['20200101':]), len(val)
(11856, 17328, 5904, 9480)
data['20200101':]['load']
dt_iso
2020-01-09 00:00:00 801.0410
2020-01-09 01:00:00 790.4990
2020-01-09 02:00:00 770.1160
2020-01-09 03:00:00 770.8910
2020-01-09 04:00:00 774.4680
...
2021-01-29 19:00:00 1,026.1950
2021-01-29 20:00:00 1,007.2650
2021-01-29 21:00:00 990.8280
2021-01-29 22:00:00 953.9190
2021-01-29 23:00:00 904.5980
Name: load, Length: 5904, dtype: float64
val['load']
load
date
2020-01-01 00:00:00 nan
2020-01-01 01:00:00 nan
2020-01-01 02:00:00 nan
2020-01-01 03:00:00 nan
2020-01-01 04:00:00 nan
... ...
2021-01-29 19:00:00 1,026.1950
2021-01-29 20:00:00 1,007.2650
2021-01-29 21:00:00 990.8280
2021-01-29 22:00:00 953.9190
2021-01-29 23:00:00 904.5980
[9480 rows x 1 columns]
Freq: H
So one can see that data has exploded with NaNs.
Obviously darts.utils.statistics.plot_acf()
and a darts.utils.statistics.check_seasonality()
do not work with NaNs.
plot_acf()
gets me a straight line at zero, where should be AR lags up until 192.
check_seasonality()
reports
[2021-03-05 17:39:16,120] INFO | darts.utils.statistics | The ACF has no local maximum for m < max_lag = 24.
If I supply fill_missing_dates=False
for the TimeSeries.from_dataframe():
series = TimeSeries.from_dataframe(data, time_col='date', value_cols=['load'], fill_missing_dates=False)
I get Could not infer frequency. Are some dates missing? Try specifying 'fill_missing_dates=True'
With freq='H'
parameter to function above no luck also. The same loophole as with pandas DateTimeIndex with freq=‘H’
In statsmodels Sarimax models I was able to overcome datetime freq warning by converting index to PeriodIndex which supports gaps.
data.index = pd.DatetimeIndex(data.index).to_period('H')
Please advise what are my options with Darts to be able to work with time series data with natural gaps?
Thanks a lot in advance for the attention.
P.S. Some pictures to illustrate time series and gaps
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (4 by maintainers)
Have you tried using a business day frequency (“B”) ?
Hi @hrzn @rmk17 , has this issue been take care of in the latest Darts please? I am also finding it very difficult to read a dataframe that has natural gaps into a TimeSeries object. Imputation on weekends do not make business sense. Is there a way we can tell TimeSeries to ignore the gaps? Pandas is able to look away at the gaps, I am sure Darts can too? Using ‘B’ as the business days also do not help BTW. Many thanks for loking.