fitGMVAR: Two-phase maximum likelihood estimation of GMVAR model

Description

fitGMVAR estimates GMVAR model in two phases: in the first phase it uses genetic algorithm to find starting values for gradient based variable metric algorithm, which it then uses to finalize the estimation in the second phase. Parallel computing is used to perform multiple rounds of estimations in parallel.

Usage

fitGMVAR(data, p, M, conditional = TRUE,
  parametrization = c("intercept", "mean"), constraints = NULL,
  ncalls = round(10 + 9 * log(M)), ncores = min(2, ncalls,
  parallel::detectCores()), maxit = 300, seeds = NULL,
  print_res = TRUE, ...)

Arguments

data

a matrix or class 'ts' object with d>1 columns. Each column is taken to represent a single time series. NA values are not supported.

a positive integer specifying the autoregressive order of the model.

a positive integer specifying the number of mixture components.

conditional

a logical argument specifying whether the conditional or exact log-likelihood function should be used. Default is TRUE.

parametrization

"mean" or "intercept" determining whether the model is parametrized with regime means $\mu_{m}$ or intercept parameters $\phi_{m,0}$, m=1,...,M. Default is "intercept".

constraints

a size $(Mpd^2 x q)$ constraint matrix $C$ specifying general linear constraints to the autoregressive parameters. We consider constraints of form ($\phi$$_{1}$$,...,$$\phi$$_{M}) = $$C \psi$, where $\phi$$_{m}$$ = (vec(A_{m,1}),...,vec(A_{m,p}) (pd^2 x 1), m=1,...,M$ contains the coefficient matrices and $\psi$ $(q x 1)$ contains the constrained parameters. For example, to restrict the AR-parameters to be the same for all regimes, set $C$= [I:...:I]' $(Mpd^2 x pd^2)$ where I = diag(p*d^2). Ignore (or set to NULL) if linear constraints should not be employed.

ncalls

the number of estimation rounds that should be performed.

ncores

the number cores to be used in parallel computing.

maxit

the maximum number of iterations in the variable metric algorithm.

seeds

a length ncalls vector containing the random number generator seed for each call to the genetic algorithm, or NULL for not initializing the seed. Exists for creating reproducible results.

print_res

should summaries of estimation results be printed?

...

additional settings passed to the function GAfit employing the genetic algorithm.

Value

Returns an object of class 'gmvar' defining the estimated GMVAR model. Multivariate quantile residuals (Kalliovirta and Saikkonen 2010) are also computed and included in the returned object. In addition, the returned object contains the estimates and log-likelihood values from all the estimation rounds performed. The estimated parameter vector can be obtained at gmvar$params (and corresponding approximate standard errors at gmvar$std_errors) and it is...

Regular models:

a size $((M(pd^2+d+d(d+1)/2+1)-1)x1)$ vector that has form $\theta$$ = $($\upsilon$$_{1}$, ...,$\upsilon$$_{M}$, $\alpha_{1},...,\alpha_{M-1}$), where:

$\upsilon$$_{m}$ $ = (\phi_{m,0},$$\phi$$_{m}$$,\sigma_{m})$
$\phi$$_{m}$$ = (vec(A_{m,1}),...,vec(A_{m,p})$
and $\sigma_{m} = vech(\Omega_{m})$, m=1,...,M.

Constrained models:

a size $((M(d+d(d+1)/2+1)+q-1)x1)$ vector that has form $\theta$$ = (\phi_{1,0},...,\phi_{M,0},$$\psi$ $,\sigma_{1},...,\sigma_{M},\alpha_{1},...,\alpha_{M-1})$, where:

$\psi$ $(qx1)$ satisfies ($\phi$$_{1}$$,...,$ $\phi$$_{M}) =$ $C \psi$. Here $C$ is $(Mpd^2xq)$ constraint matrix.

Above $\phi_{m,0}$ is the intercept parameter, $A_{m,i}$ denotes the $i$:th coefficient matrix of the $m$:th mixture component, $\Omega_{m}$ denotes the error term covariance matrix of the $m$:th mixture component and $\alpha_{m}$ is the mixing weight parameter. If parametrization=="mean", just replace each $\phi_{m,0}$ with regimewise mean $\mu_{m}$. $vec()$ is vectorization operator that stacks columns of a given matrix into a vector. $vech()$ stacks columns of a given matrix from the principal diagonal downwards (including elements on the diagonal) into a vector. The notations are in line with the cited article by Kalliovirta, Meitz and Saikkonen (2016).

Remark that the first autocovariance/correlation matrix in $uncond_moments is for the lag zero, the second one for the lag one, etc.

S3 methods

The following S3 methods are supported for class 'gmvar': logLik, residuals, print, summary, predict and plot.

Details

Because of complexity and multimodality of the log-likelihood function, it's not certain that the estimation algorithms will end up in the global maximum point. It's expected that most of the estimation rounds will end up in some local maximum point instead. Therefore a number of estimation rounds is required for reliable results. Because of the nature of the model, the estimation may fail especially in the cases where the number of mixture components is chosen too large.

Overall the estimation process is computationally heavy and it might take considerably long time for large models with large number of observations. If the iteration limit maxit in the variable metric algorithm is reached, one can continue the estimation by iterating more with the function iterate_more.

The genetic algorithm is mostly based on the description by Dorsey and Mayer (1995), but it includes some extra functionality designed for this particular estimation problem. The genetic algorithm uses (slightly modified) individually adaptive crossover and mutation rates described by Patnaik and Srinivas (1994) and employs (50%) fitness inheritance discussed by Smith, Dike and Stegmann (1995).

The gradient based variable metric algorithm used in the second phase is implemented with function optim from the package stats.

References

Dorsey R. E. and Mayer W. J. 1995. Genetic algorithms for estimation problems with multiple optima, nondifferentiability, and other irregular features. Journal of Business & Economic Statistics, 13, 53-66.
Kalliovirta L., Meitz M. and Saikkonen P. 2016. Gaussian mixture vector autoregression. Journal of Econometrics, 192, 485-498.
Kalliovirta L. and Saikkonen P. 2010. Reliable Residuals for Multivariate Nonlinear Time Series Models. Unpublished Revision of HECER Discussion Paper No. 247.
Patnaik L.M. and Srinivas M. 1994. Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms. Transactions on Systems, Man and Cybernetics 24, 656-667.
Smith R.E., Dike B.A., Stegmann S.A. 1995. Fitness inheritance in genetic algorithms. Proceedings of the 1995 ACM Symposium on Applied Computing, 345-350.

Examples

Run this code

# NOT RUN {
## These are long running examples that use parallel computing!

# These examples use the data 'eurusd' which comes with the
# package, but in a scaled form (similar to Kalliovirta et al. 2016).
data(eurusd, package="gmvarkit")
data <- cbind(10*eurusd[,1], 100*eurusd[,2])
colnames(data) <- colnames(eurusd)

# GMVAR(1,2) model: 10 estimation rounds with seeds set
# for reproducibility
fit12 <- fitGMVAR(data, p=1, M=2, ncalls=10, seeds=1:10)
fit12
plot(fit12)
summary(fit12)

# GMVAR(2,2) model with mean parametrization
fit22 <- fitGMVAR(data, p=2, M=2, parametrization="mean")
fit22

# GMVAR(2,2) model with autoregressive parameters restricted
# to be the same for both regimes
C_mat <- rbind(diag(2*2^2), diag(2*2^2))
fit22c <- fitGMVAR(data, p=2, M=2, constraints=C_mat)
fit22c

# GMVAR(2,2) model with autoregressive parameters restricted
# to be the same for both regimes and non-diagonl elements
# the coefficient matrices constrained to zero. Estimation
# with only 10 estimation rounds.
tmp <- matrix(c(1, rep(0, 10), 1, rep(0, 8), 1, rep(0, 10), 1),
 nrow=2*2^2, byrow=FALSE)
C_mat2 <- rbind(tmp, tmp)
fit22c2 <- fitGMVAR(data, p=2, M=2, constraints=C_mat2, ncalls=10)
fit22c2
# }

Run the code above in your browser using DataLab