GAfit
estimates the specified GMVAR model using a genetic algorithm.
It's designed to find starting values for gradient based methods.
GAfit(
data,
p,
M,
conditional = TRUE,
parametrization = c("intercept", "mean"),
constraints = NULL,
ngen = 200,
popsize,
smart_mu = min(100, ceiling(0.5 * ngen)),
initpop = NULL,
mu_scale,
mu_scale2,
omega_scale,
ar_scale = 1,
regime_force_scale = 1,
red_criteria = c(0.05, 0.01),
to_return = c("alt_ind", "best_ind"),
minval,
seed = NULL
)
a matrix or class 'ts'
object with d>1
columns. Each column is taken to represent
a single time series. NA
values are not supported.
a positive integer specifying the autoregressive order of the model.
a positive integer specifying the number of mixture components.
a logical argument specifying whether the conditional or exact log-likelihood function should be used.
"mean"
or "intercept"
determining whether the model is parametrized with regime means \(\mu_{m}\) or
intercept parameters \(\phi_{m,0}\), m=1,...,M.
a size \((Mpd^2 x q)\) constraint matrix \(C\) specifying general linear constraints
to the autoregressive parameters. We consider constraints of form
(\(\phi\)\(_{1}\)\(,...,\)\(\phi\)\(_{M}) = \)\(C \psi\),
where \(\phi\)\(_{m}\)\( = (vec(A_{m,1}),...,vec(A_{m,p}) (pd^2 x 1), m=1,...,M\)
contains the coefficient matrices and \(\psi\) \((q x 1)\) contains the constrained parameters.
For example, to restrict the AR-parameters to be the same for all regimes, set \(C\)=
[I:...:I
]' \((Mpd^2 x pd^2)\) where I = diag(p*d^2)
.
Ignore (or set to NULL
) if linear constraints should not be employed.
a positive integer specifying the number of generations to be ran through in the genetic algorithm.
a positive even integer specifying the population size in the genetic algorithm.
Default is 10*n_params
.
a positive integer specifying the generation after which the random mutations in the genetic algorithm are "smart". This means that mutating individuals will mostly mutate fairly close (or partially close) to the best fitting individual (which has the least regimes with time varying mixing weights practically at zero) so far.
a list of parameter vectors from which the initial population of the genetic algorithm will be generated from. The parameter vectors should be...
Should be size \(((M(pd^2+d+d(d+1)/2+1)-1)x1)\) and have form \(\theta\)\( = \)(\(\upsilon\)\(_{1}\), ...,\(\upsilon\)\(_{M}\), \(\alpha_{1},...,\alpha_{M-1}\)), where:
\(\upsilon\)\(_{m}\) \( = (\phi_{m,0},\)\(\phi\)\(_{m}\)\(,\sigma_{m})\)
\(\phi\)\(_{m}\)\( = (vec(A_{m,1}),...,vec(A_{m,p})\)
and \(\sigma_{m} = vech(\Omega_{m})\), m=1,...,M.
Should be size \(((M(d+d(d+1)/2+1)+q-1)x1)\) and have form \(\theta\)\( = (\phi_{1,0},...,\phi_{M,0},\)\(\psi\) \(,\sigma_{1},...,\sigma_{M},\alpha_{1},...,\alpha_{M-1})\), where:
\(\psi\) \((qx1)\) satisfies (\(\phi\)\(_{1}\)\(,...,\) \(\phi\)\(_{M}) =\) \(C \psi\). Here \(C\) is \((Mpd^2xq)\) constraint matrix.
Above, \(\phi_{m,0}\) is the intercept parameter, \(A_{m,i}\) denotes the \(i\):th coefficient matrix of
the \(m\):th mixture component, \(\Omega_{m}\) denotes the error term covariance matrix of the \(m\):th
mixture component, and \(\alpha_{m}\) is the mixing weight parameter.
If parametrization=="mean"
, just replace each \(\phi_{m,0}\) with the regimewise mean \(\mu_{m}\).
\(vec()\) is vectorization operator that stacks columns of a given matrix into a vector. \(vech()\) stacks columns
of a given matrix from the principal diagonal downwards (including elements on the diagonal) into a vector.
The notation is in line with the cited article by Kalliovirta, Meitz and Saikkonen (2016) which introduces
the GMVAR model.
a size \((dx1)\) vector defining means of the normal distributions from which each
mean parameter \(\mu_{m}\) is drawn from in random mutations. Default is colMeans(data)
. Note that
mean-parametrization is always used for optimization in GAfit
- even when parametrization=="intercept"
.
However, input (in initpop
) and output (return value) parameter vectors can be intercept-parametrized.
a size \((dx1)\) strictly positive vector defining standard deviations of the normal
distributions from which each mean parameter \(\mu_{m}\) is drawn from in random mutations.
Default is 2*sd(data[,i]), i=1,..,d
.
a size \((dx1)\) strictly positive vector specifying the scale and variability of the
random covariance matrices in random mutations. The covariance matrices are drawn from (scaled) Wishart
distribution. Expected values of the random covariance matrices are diag(omega_scale)
. Standard
deviations of the diagonal elements are sqrt(2/d)*omega_scale[i]
and for non-diagonal elements they are sqrt(1/d*omega_scale[i]*omega_scale[j])
.
Note that for d>4
this scale may need to be chosen carefully. Default in GAfit
is
var(stats::ar(data[,i], order.max=10)$resid, na.rm=TRUE), i=1,...,d
.
a positive real number adjusting how large AR parameter values are typically generated in
some random mutations. See the function random_coefmats2
for details. This is ignored when estimating
constrained models.
a non-negative real number specifying how much should natural selection favour individuals
with less regimes that have almost all mixing weights (practically) at zero. Set to zero for no favouring or large
number for heavy favouring. Without any favouring the genetic algorithm gets more often stuck in an area of the
parameter space where some regimes are wasted, but with too much favouring the best genes might never mix into
the population and the algorithm might converge poorly. Default is 1
and it gives \(2x\) larger surviving
probability weights for individuals with no wasted regimes compared to individuals with one wasted regime.
Number 2
would give \(3x\) larger probability weights etc.
a length 2 numeric vector specifying the criteria that is used to determine whether a regime is
redundant (or "wasted") or not.
Any regime m
which satisfies sum(mixingWeights[,m] > red_criteria[1]) < red_criteria[2]*n_obs
will
be considered "redundant". One should be careful when adjusting this argument (set c(0, 0)
to fully disable
the 'redundant regime' features from the algorithm).
should the genetic algorithm return the best fitting individual which has "positive enough" mixing
weights for as many regimes as possible ("alt_ind"
) or the individual which has the highest log-likelihood
in general ("best_ind"
) but might have more wasted regimes?
a real number defining the minimum value of the log-likelihood function that will be considered.
Values smaller than this will be treated as they were minval
and the corresponding individuals will
never survive. The default is -(10^(ceiling(log10(n_obs)) + d) - 1)
.
a single value, interpreted as an integer, or NULL, that sets seed for the random number generator in the beginning of
the function call. If calling GAfit
from fitGMVAR
, use the argument seeds
instead of passing the argument seed
.
Returns the estimated parameter vector which has the form described in initpop
.
The core of the genetic algorithm is mostly based on the description by Dorsey and Mayer (1995). It utilizes a slightly modified version of the individually adaptive crossover and mutation rates described by Patnaik and Srinivas (1994) and employs (50%) fitness inheritance discussed by Smith, Dike and Stegmann (1995).
By "redundant" or "wasted" regimes we mean regimes that have the time varying mixing weights practically at zero for almost all t. A model including redundant regimes would have about the same log-likelihood value without the redundant regimes and there is no purpose to have redundant regimes in a model.
Ansley C.F., Kohn R. 1986. A note on reparameterizing a vector autoregressive moving average model to enforce stationarity. Journal of statistical computation and simulation, 24:2, 99-106.
Dorsey R. E. and Mayer W. J. 1995. Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features. Journal of Business & Economic Statistics, 13, 53-66.
Kalliovirta L., Meitz M. and Saikkonen P. 2016. Gaussian mixture vector autoregression. Journal of Econometrics, 192, 485-498.
Patnaik L.M. and Srinivas M. 1994. Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. Transactions on Systems, Man and Cybernetics 24, 656-667.
Smith R.E., Dike B.A., Stegmann S.A. 1995. Fitness inheritance in genetic algorithms. Proceedings of the 1995 ACM Symposium on Applied Computing, 345-350.
@export