[BUG] Inverse transform does not work on series of length < 3
See original GitHub issueFirst off, I want to say that I’m not an ML engineer at all. We have our own custom timeseries forecasting code that our ML engineers wrote. I stumbled upon darts, and we are very excited by the prospect of using it, given that we can use so many different kinds of models to generate our predictions. Our problem statement is as follows: We know the weekly consumption of our widgets by customers, and want to predict how many they will use next week. We only need 1 prediction. To that end, I wrote some code that cleans and preps the data, and is then passed to darts. Everything works as expected, until we hit the inverse_transform() function.
Describe the bug
We only care about 1 prediction step, so we invoke the prediction by saying: pred_series = my_model.predict(n=1)
.
This works, and I am able to get the scaled version of what the usage is going to be next week. However, I want the human-readable/understandable number, and so I do: print(transformer.inverse_transform(pred_series))
. I then get the following error:
File "train/run_models.py", line 124, in get_model
print(transformer.inverse_transform(pred_series))
File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/preprocessing/scaler_wrapper.py", line 102, in inverse_transform
reshape((-1, series.width))))
File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/timeseries.py", line 571, in from_times_and_values
return TimeSeries(df, freq, fill_missing_dates)
File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/timeseries.py", line 58, in __init__
'is not passed', logger)
File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/logging.py", line 54, in raise_if_not
raise ValueError(message)
I read through the source code and didn’t see any place in the inverse_transform() function where the frequency is being passed. Merely:
return TimeSeries.from_times_and_values(series.time_index(),
self.transformer.inverse_transform(series.values().
reshape((-1, series.width))))
I then tried to pass the frequency argument as follows: print(transformer.inverse_transform(pred_series), "W-SUN")
, and got this error instead:
File "train/run_models.py", line 124, in get_model
print(transformer.inverse_transform(pred_series, "W-SUN"))
TypeError: inverse_transform() takes 2 positional arguments but 3 were given
I put “W” instead of “W-SUN” too, with similar results (same error, that is).
So my question is, how to use darts to get precisely 1 prediction? And for it to be not scaled? I could generate 3, pick the first one and that’d be that, but I’m not sure if that’s the best/right approach to this problem.
To Reproduce Here’s part of my code. Prepare any dataset of your choice ahead of this snippet, please.
......
......
# Number of previous time stamps taken into account.
SEQ_LENGTH = 2
# Number of features in last hidden state
HIDDEN_SIZE = 15 * SEQ_LENGTH
# number of output time-steps to predict
OUTPUT_LEN = 1
# Number of stacked rnn layers.
NUM_LAYERS = 2
my_model = RNNModel(
model='LSTM',
output_length=OUTPUT_LEN,
hidden_size=HIDDEN_SIZE,
n_rnn_layers=NUM_LAYERS,
input_length=SEQ_LENGTH,
batch_size=100,
n_epochs=150,
model_name='Air_RNN', log_tensorboard=True
)
my_model.fit(train_transformed, val_transformed, verbose=True)
pred_series = my_model.predict(n=3)
backtest_series = backtest_forecasting(series_transformed, my_model, pd.Timestamp('20200621'),
fcast_horizon_n=1, verbose=True)
print('RMSE: {:.4f}'.format(rmse(transformer.inverse_transform(series_transformed),
transformer.inverse_transform(backtest_series))))
my_model.fit(series_transformed, verbose=True)
pred_series = my_model.predict(n=1)
print(pred_series)
# Error is in this next line. Everything above this works like a charm
print(transformer.inverse_transform(pred_series))
Expected behavior I expect to see a single record with an inverse transformed value.
System (please complete the following information):
- Python version: 3.7.7
- darts version: 0.2.1
Additional context Add any other context about the problem here.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5 (3 by maintainers)
Glad to hear it worked!
pandas.DatetimeIndex.inferred_freq
to automatically determine the frequency. This only works forDatetimeIndex
objects with a length of at least 3. So when creating a newTimeSeries
instance, cases with a length shorter than 3 are handled differently. Also, we decided to warn the user when such a time series is created since it represents somewhat of an edge case. I’m sure there is still a lot of potential to improve the current approach, but this is the current setup 😃OUTPUT_LEN
has to be set to 1 for theRNNModel
. Sometimes it can still useful to try a higher number since the model might learn more general trends. You could also try enhancing your univariate time series by a datetime attribute series and make it multivariate (some very basic examples: https://github.com/unit8co/darts/blob/master/examples/multivariate-examples.ipynb). Other than that, I suggest trying out our other models too, you might find one with a better fit. To find the right hyperparameters for simpler models, you can try using our backtest_gridsearch function. To get a quick overview of the performance of simpler models, our explore_models function might be interesting too (although this one is a bit experimental). But please keep in mind I am not an expert in data science.Thanks for all your feedback!
Hi akshayi1, thanks a lot for this detailed issue description! You hit the nail on the head regarding the reason for this bug: The frequency was not passed to the
ScalerWrapper.inverse_transform
function. We just addressed this problem in PR #143 . I tested it on your code snipped and it appears to solve this issue. We also released this patch to PyPi, so you should be able to install it like this:Please let us know if this solves your problem!