Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Inverse transform does not work on series of length < 3

See original GitHub issue

First off, I want to say that I’m not an ML engineer at all. We have our own custom timeseries forecasting code that our ML engineers wrote. I stumbled upon darts, and we are very excited by the prospect of using it, given that we can use so many different kinds of models to generate our predictions. Our problem statement is as follows: We know the weekly consumption of our widgets by customers, and want to predict how many they will use next week. We only need 1 prediction. To that end, I wrote some code that cleans and preps the data, and is then passed to darts. Everything works as expected, until we hit the inverse_transform() function.

Describe the bug We only care about 1 prediction step, so we invoke the prediction by saying: pred_series = my_model.predict(n=1). This works, and I am able to get the scaled version of what the usage is going to be next week. However, I want the human-readable/understandable number, and so I do: print(transformer.inverse_transform(pred_series)). I then get the following error:

  File "train/run_models.py", line 124, in get_model
    print(transformer.inverse_transform(pred_series))
  File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/preprocessing/scaler_wrapper.py", line 102, in inverse_transform
    reshape((-1, series.width))))
  File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/timeseries.py", line 571, in from_times_and_values
    return TimeSeries(df, freq, fill_missing_dates)
  File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/timeseries.py", line 58, in __init__
    'is not passed', logger)
  File "/home/ec2-user/darts/lib/python3.7/site-packages/darts/logging.py", line 54, in raise_if_not
    raise ValueError(message)

I read through the source code and didn’t see any place in the inverse_transform() function where the frequency is being passed. Merely:

return TimeSeries.from_times_and_values(series.time_index(),
                                                self.transformer.inverse_transform(series.values().
                                                                                   reshape((-1, series.width))))

I then tried to pass the frequency argument as follows: print(transformer.inverse_transform(pred_series), "W-SUN"), and got this error instead:

  File "train/run_models.py", line 124, in get_model
    print(transformer.inverse_transform(pred_series, "W-SUN"))
TypeError: inverse_transform() takes 2 positional arguments but 3 were given

I put “W” instead of “W-SUN” too, with similar results (same error, that is).

So my question is, how to use darts to get precisely 1 prediction? And for it to be not scaled? I could generate 3, pick the first one and that’d be that, but I’m not sure if that’s the best/right approach to this problem.

To Reproduce Here’s part of my code. Prepare any dataset of your choice ahead of this snippet, please.

    ......
    ......

    # Number of previous time stamps taken into account.
    SEQ_LENGTH = 2
    # Number of features in last hidden state
    HIDDEN_SIZE = 15 * SEQ_LENGTH
    # number of output time-steps to predict
    OUTPUT_LEN = 1
    # Number of stacked rnn layers.
    NUM_LAYERS = 2

    my_model = RNNModel(
        model='LSTM',
        output_length=OUTPUT_LEN,
        hidden_size=HIDDEN_SIZE,
        n_rnn_layers=NUM_LAYERS,
        input_length=SEQ_LENGTH,
        batch_size=100,
        n_epochs=150,
        model_name='Air_RNN', log_tensorboard=True
    )

    my_model.fit(train_transformed, val_transformed, verbose=True)

    pred_series = my_model.predict(n=3)

    backtest_series = backtest_forecasting(series_transformed, my_model, pd.Timestamp('20200621'),
                                       fcast_horizon_n=1, verbose=True)

    print('RMSE: {:.4f}'.format(rmse(transformer.inverse_transform(series_transformed),
                                 transformer.inverse_transform(backtest_series))))

    my_model.fit(series_transformed, verbose=True)

    pred_series = my_model.predict(n=1)

    print(pred_series)

    # Error is in this next line. Everything above this works like a charm
    print(transformer.inverse_transform(pred_series))

Expected behavior I expect to see a single record with an inverse transformed value.

System (please complete the following information):

Python version: 3.7.7
darts version: 0.2.1

Additional context Add any other context about the problem here.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

pennfranccommented, Jul 14, 2020

Glad to hear it worked!

For longer time series, we use pandas.DatetimeIndex.inferred_freq to automatically determine the frequency. This only works for DatetimeIndex objects with a length of at least 3. So when creating a new TimeSeries instance, cases with a length shorter than 3 are handled differently. Also, we decided to warn the user when such a time series is created since it represents somewhat of an edge case. I’m sure there is still a lot of potential to improve the current approach, but this is the current setup 😃
Your approach looks good to me! Please note that even though you’re only making a single prediction, that does not necessarily mean OUTPUT_LEN has to be set to 1 for the RNNModel. Sometimes it can still useful to try a higher number since the model might learn more general trends. You could also try enhancing your univariate time series by a datetime attribute series and make it multivariate (some very basic examples: https://github.com/unit8co/darts/blob/master/examples/multivariate-examples.ipynb). Other than that, I suggest trying out our other models too, you might find one with a better fit. To find the right hyperparameters for simpler models, you can try using our backtest_gridsearch function. To get a quick overview of the performance of simpler models, our explore_models function might be interesting too (although this one is a bit experimental). But please keep in mind I am not an expert in data science.

Thanks for all your feedback!

1reaction

pennfranccommented, Jul 14, 2020

Hi akshayi1, thanks a lot for this detailed issue description! You hit the nail on the head regarding the reason for this bug: The frequency was not passed to the ScalerWrapper.inverse_transform function. We just addressed this problem in PR #143 . I tested it on your code snipped and it appears to solve this issue. We also released this patch to PyPi, so you should be able to install it like this:

pip install u8darts

Please let us know if this solves your problem!

Top Results From Across the Web

sklearn MinMaxScaler inverse_transform "Found array with ...

The problem is that when I comment out the second block, after compiling the model, a line in which an inverse_transform scale of...

scikit learn - sklearn::PCA, Inverse transform(transform(X)) = X?

Fit the full data to a PCA with 2 components. Then do a transform of the sample followed by an inverse transform. The...

Scripting API: Transform.InverseTransformPoint

Transforms position from world space to local space. This function is essentially the opposite of Transform.TransformPoint, which is used to convert from local ......

MATLAB idct - Inverse discrete cosine transform

This MATLAB function returns the inverse discrete cosine transform of input array y. ... Verify that DCT-2 and DCT-3 are inverses of each...

Fast Fourier Transforms (FFTs) — GSL 2.7 documentation

The advantage of this convention is that the inverse transform recreates the original function ... The mixed-radix functions work for FFTs of any...