Bridge ARIMA-XGBoost Modeling function
auto_arima_xgboost_fit_impl(
x,
y,
period = "auto",
max.p = 5,
max.d = 2,
max.q = 5,
max.P = 2,
max.D = 1,
max.Q = 2,
max.order = 5,
d = NA,
D = NA,
start.p = 2,
start.q = 2,
start.P = 1,
start.Q = 1,
stationary = FALSE,
seasonal = TRUE,
ic = c("aicc", "aic", "bic"),
stepwise = TRUE,
nmodels = 94,
trace = FALSE,
approximation = (length(x) > 150 | frequency(x) > 12),
method = NULL,
truncate = NULL,
test = c("kpss", "adf", "pp"),
test.args = list(),
seasonal.test = c("seas", "ocsb", "hegy", "ch"),
seasonal.test.args = list(),
allowdrift = TRUE,
allowmean = TRUE,
lambda = NULL,
biasadj = FALSE,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bytree = NULL,
colsample_bynode = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
...
)
A dataframe of xreg (exogenous regressors)
A numeric vector of values to fit
A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided.
The maximum order of the non-seasonal auto-regressive (AR) terms.
The maximum order of integration for non-seasonal differencing.
The maximum order of the non-seasonal moving average (MA) terms.
The maximum order of the seasonal auto-regressive (SAR) terms.
The maximum order of integration for seasonal differencing.
The maximum order of the seasonal moving average (SMA) terms.
Maximum value of p+q+P+Q if model selection is not stepwise.
Order of first-differencing. If missing, will choose a value based
on test
.
Order of seasonal-differencing. If missing, will choose a value
based on season.test
.
Starting value of p in stepwise procedure.
Starting value of q in stepwise procedure.
Starting value of P in stepwise procedure.
Starting value of Q in stepwise procedure.
If TRUE
, restricts search to stationary models.
If FALSE
, restricts search to non-seasonal models.
Information criterion to be used in model selection.
If TRUE
, will do stepwise selection (faster).
Otherwise, it searches over all models. Non-stepwise selection can be very
slow, especially for seasonal models.
Maximum number of models considered in the stepwise search.
If TRUE
, the list of ARIMA models considered will be
reported.
If TRUE
, estimation is via conditional sums of
squares and the information criteria used for model selection are
approximated. The final model is still computed using maximum likelihood
estimation. Approximation should be used for long time series or a high
seasonal period to avoid excessive computation times.
fitting method: maximum likelihood or minimize conditional sum-of-squares. The default (unless there are missing values) is to use conditional-sum-of-squares to find starting values, then maximum likelihood. Can be abbreviated.
An integer value indicating how many observations to use in
model selection. The last truncate
values of the series are used to
select a model when truncate
is not NULL
and
approximation=TRUE
. All observations are used if either
truncate=NULL
or approximation=FALSE
.
Type of unit root test to use. See ndiffs
for
details.
Additional arguments to be passed to the unit root test.
This determines which method is used to select the number of seasonal differences. The default method is to use a measure of seasonal strength computed from an STL decomposition. Other possibilities involve seasonal unit root tests.
Additional arguments to be passed to the seasonal
unit root test.
See nsdiffs
for details.
If TRUE
, models with drift terms are considered.
If TRUE
, models with a non-zero mean are considered.
Box-Cox transformation parameter. If lambda="auto"
,
then a transformation is automatically selected using BoxCox.lambda
.
The transformation is ignored if NULL. Otherwise,
data transformed before model is estimated.
Use adjusted back-transformed mean for Box-Cox transformations. If transformed data is used to produce forecasts and fitted values, a regular back transformation will result in median forecasts. If biasadj is TRUE, an adjustment will be made to produce mean forecasts and fitted values.
An integer for the maximum depth of the tree.
An integer for the number of boosting iterations.
A numeric value between zero and one to control the learning rate.
Subsampling proportion of columns.
Subsampling proportion of columns for each node
within each tree. See the counts
argument below. The default uses all
columns.
A numeric value for the minimum sum of instance weights needed in a child to continue to split.
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree
Subsampling proportion of rows.
A positive number. If on [0, 1)
the value, validation
is a random proportion of data in x
and y
that are used for performance
assessment and potential early stopping. If 1 or greater, it is the number
of training set samples use for these purposes.
An integer or NULL
. If not NULL
, it is the number of
training iterations without improvement before stopping. If validation
is
used, performance is base on the validation set; otherwise the training set
is used.
Additional arguments passed to xgboost::xgb.train