time series - Predictions with ARIMA (python statsmodels) -


i have time series data contains seasonal trends , want use arima model predict how series behave in future.

in order predict how variable of interest (log_var) behave have taken weekly, monthly , annual difference , used these input arima model.

below example.

exog = np.column_stack([df_arima['log_var_diff_wk'],                          df_arima['log_var_diff_mth'],                          df_arima['log_var_diff_yr']])   model = arima(df_arima['log_var'], exog = exog, order=(1,0,1))  results_arima = model.fit()   

i doing several different data sources , in of them see great results, in sense if plot log_var against results_arima.fittedvalues training data matches (i tune p , q each data source separately, d 0 given have taken difference myself).

however, want check predictions like, , in order redfine exog 'test' dataset. example, if train original arima model on 2014-01-01 2016-01-01, 'test' set 2016-01-01 onwards.

my approach has worked data sources (in sense plot forecast against known values , trends sensible) badly others, although same 'kind' of data , have been taken different geographical locations. in of locations fails catch obvious seasonal trends occur again , again in training data on same dates each year. arima model fits training data well, seems in cases predictions useless.

i wondering if following correct procedure predict values arima model. approach basically:

exog = np.column_stack([df_arima_predict['log_val_diff_wk'],                          df_arima_predict['log_val_diff_mth'],                          df_arima_predict['log_val_diff_yr']])  arima_predict = results_arima.predict(start=training_cut_date, end = '2017-01-01', dynamic = false, exog = exog) 

is correct way go making predictions arima?

if so, there way can try understand why predictions in datasets , terrible in others, when arima model seems fit training data in both cases?

i have similar problem atm have not entirely figured out yet. seems including multiple seasonal terms in python still bit tricky. r seem have capacity, see here. so, 1 suggestion can give try more sophisticated functionality r provides (although require large investment of time if not familiar r yet).

looking @ approach modeling seasonal patterns, taking nth order difference scores not give seasonal constants, rather representation of difference between time points designate seasonally related. if differences small, correcting them might not have impact on modeling results. in such cases, model prediction might turn out well. conversely, if differences big, including them can distort prediction results. explain variation seeing in modeling results. conceptually, then, you'd want instead represent constants on time.

in blog post referenced above, author advocates use of fourier series model variance within each time period. both numpy , scipy packages offer routines calculating fast fourier transform. however, non-mathematician found difficult ascertain fast fourier transform yielded appropriate numbers.

in end opted use welch signal decomposition form scipy's signal module. return spectral density analysis of time series, can deduce signal strength @ various frequencies in time series.

if identify peaks in spectral density analysis correspond seasonal frequencies trying account in time series, can use frequencies , amplitudes construct sine waves representing seasonal variations. can include these in arima exogenous variables, fourier terms in blog post.

this far have gotten myself @ point - right trying figure out whether can statsmodels arima process use these sine waves, specify seasonal trend, exogenous variables in model (the documentation specifies should not represent trends hey, guy can dream, right?) edit: this blog post rob hyneman highly relevant, , explains of rationale behind including fourier terms.

sorry i'm not able give solution that's proven effective within python, hope gives new ideas control pesky seasonal variance.

tl;dr:

  • it seems python not suited handle multiple seasonal terms right now, r might better solution (see reference);

  • using difference scores account seasonal trends seems not capture constant variance associated recurrence of season;

  • one way in python use fourier series representing seasonal trends (also see reference), can obtained using, among other ways, welch signal decomposition. how use these exogenous variables in arima effect open question, though.

best of luck,

evert

p.s.: i'll update if find way work in python


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -