Difference between R and Python implementation of auto.arima
See original GitHub issueQuestion In the documentation, there is written:
pmdarima
is designed to behave as similarly to R’s well-knownauto.arima
as possible
Could someone answer the question, what are the differences?
- I tried to run simple
ARIMA
(order is not important at the moment) on my dataset (1000 data points). Something like that:
import pmdarima as pm
arima = pm.ARIMA(order=(0,1,1))
arima_fit = arima.fit(ts_data)
And it’s pretty fast.
- When I use the same dataset and use
auto_arima
function (likepm.auto_arima(ts_data)
), it’s taking a bit more time (measured withtimeit
):
1.07 s ± 53.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
R implementation of auto.arima
is roughly 10 times faster. What’s the reason? Is there a way how to improve that?
- When I take the same dataset and use R and Python implementations of auto ARIMA I get (depends on data) different results. The default parameters seem to be the same. What’s the reason for that?
Versions Windows-10-10.0.17134-SP0 Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] pmdarima 1.2.1 NumPy 1.16.4 SciPy 1.2.1 Scikit-Learn 0.20.2 Statsmodels 0.9.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Time Series Forecasting Methods | Arima In Python and R
A basic introduction to various time series forecasting methods and techniques. This guide includes an auto arima model with implementation ...
Read more >r and python suggest different arima models for same data, why?
The variable tp imput here is the univiriate time series data which I indicate with ts in the python code. the result is...
Read more >Python vs R for time-series forecasting
Yes, R has a lot of time series libraries — a lot of good work was done a while back by Rob Hyndman,...
Read more >Automatic vs. Manual ARIMA Configuration | by Michael Grogan
An ARIMA model stands for Autoregressive Integrated Moving Average Model, and the key difference is that the model is designed to work with...
Read more >Arima model always predicts the same value in python ...
Arima model always predicts the same value in python implementation, R's Auto Arima however gives me a well function model.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Doing a side-by-side on wineind:
Python:
As stated, this does take a while. Timings for each model are shown via
trace
R:
Returns almost immediately:
Explanation and root cause analysis
First of all, when you cross the language border, there is not always such a thing as a 1-to-1 rewrite, so that is the core challenge here. Under the hood, we are using statsmodels to fit our ARMA, ARIMA and SARIMAX models, and what we’re providing is the optimization code, tests of seasonality and stationarity, etc.
Just to make sure it isn’t a bottleneck on our side, I ran an experiment on the slowest of the models shown above:
When you dig deeper, R’s ARIMA code is almost 100% C, and statsmodels’ is almost 100% python. While that will account for a lot of it, I’m not one to be swayed by that fact alone… my opinion is there is surely something going on in the SARIMAX class that is causing this to drag, and I don’t have a perfect explanation for you right now.
RE: AIC, BIC, etc., my guess is that statsmodels computes those values in a slightly different fashion than statsmodels. I wish I had a better answer for you… Now, it could totally be that the manner in which we’re using the SARIMAX interface (or one of the default options we’re passing) is causing a slow-down, and that’s something I’m not really certain about. But I do think it warrants maybe a deep dive by someone on our side to see if it’s anything we have control over.
Thank you for bringing this up; hopefully we can get a satisfactory explanation, if not solution, to you in a reasonable time.
There was a time when R added more rules to its stepwise search algorithm, so the two diverged. The new release in 1.5.0 will reconcile these differences. However, one reason the same algorithm may not even search the same orders is due to the AIC of the results. The stepwise search refines its search based on the AIC, and since R and Python have subtly different ways of computing it, the same progression of models may not yield the same refinement, if that makes sense.
Another thing I’ve noticed recently, is R is faster because it will approximate more liberally:
Notice
approximation = (length(x) > 150 | frequency(x) > 12)
. The python version does not approximate unless you explicitly passsimple_differencing=True
, and even then the results still differ slightly. That has to do with the number of iterations run in the optimizer. R’s BFGS optimization defaults to 100 iterations (runoptim
in R to see the function), while pmdarima’sARIMA
defaults to 50 iterations. If you setsimple_differencing=True
, you might also crank up themaxiter
param to refine it some more.