fitnvmix: Fitting Multivariate Normal Variance Mixtures

Description

Functionalities for fitting multivariate normal variance mixtures (in particular also Multivariate t distributions) via an ECME algorithm.

Usage

fitnvmix(x, qmix, mix.param.bounds, nu.init = NA, loc = NULL, scale = NULL,
         init.size.subsample = min(n, 100), size.subsample = n,
         control = list(), verbose = TRUE)
fitStudent(x, loc = NULL, scale = NULL, mix.param.bounds = c(1e-3, 1e2), ...)
fitNorm(x)

Value

The function fitnvmix() returns an S3 object of

class

"fitnvmix", basically a list

which contains, among others, the components

nu: Estimated mixing parameter (vector) nu.
loc: Estimated or provided loc vector.
scale: Estimated or provided scale matrix.
max.ll: Estimated log-likelihood at reported estimates.
x: Input data matrix x.

The methods print(), summary() and plot() are defined for the class "fitnvmix".

fitStudent() is a wrapper to fitnvmix() for parameter estimation of multivariate Student t distributions; it also returns an S3 object of class

"fitnvmix" where the fitted degrees of freedom are called "df" instead of

"nu" (to be consistent with the other wrappers for the Student t distributions).

fitNorm() just returns a list with components

loc (columnwise sample means) and scale (sample covariance matrix).

Arguments

x

\((n, d)\)-data matrix.

qmix

specification of the mixing variable \(W\); see McNeil et al. (2015, Chapter 6). Supported are the following types of specification (see also the examples below):

character:: character string specifying a supported distribution; currently available are "constant" (in which case \(W = 1\) and thus the multivariate normal distribution with mean vector loc and covariance matrix scale results), "inverse.gamma" (in which case \(W\) is inverse gamma distributed with shape and rate parameters df/2 and thus the multivariate Student t distribution with df degrees of freedom results) and "pareto" (in which case \(W\) is Pareto distributed with scale equal to unity and shape equal to alpha).

function:

function interpreted as the quantile function of the mixing variable \(W\). In this case, qmix must have the form qmix = function(u, nu), where the argument nu corresponds to the parameter (vector) specifying the distribution of the mixing variable.

mix.param.bounds

either numeric(2) or a matrix with two columns. The first/second column corresponds to the lower/upper bound of \(nu_i\), the ith component of the parameter vector \(nu\) of the mixing variable \(W\). All elements need to be finite, numeric values. Note: The algorithm tends to converge quicker if the parameter ranges supplied are smaller.

nu.init

either NA or an initial guess for the parameter (vector) \(nu\). In the former case an initial estimate is calculated by the algorithm. If nu.init is provided, the algorithm often converges faster; the better the starting value, the faster convergence.

loc

\(d\)-vector; if provided, taken as the 'true' location vector in which case loc is not estimated.

scale

positive definite \((d, d)\)-matrix; if provided, taken as the 'true' scale matrix in which case scale is not estimated.

init.size.subsample

numeric, non-negative, giving the sub-samplesize used to get an initial estimate for \(nu\). Only used if is.na(nu.init), otherwise ignored.

size.subsample

numeric, non-negative, specifying the size of the subsample that is being used in the ECME iterations to optimize the log-likelihood over \(nu\). Defaults to n, so that the full sample is being used. Decreasing this number can lead to faster run-times (as fewer log-densities need to be estimated) but also to an increase in bias and variance.

control

list specifying algorithm specific parameters; see below under 'Details' and get_set_param().

verbose

numeric or logical (in which case it is converted to numeric) specifying the amount of tracing to be done. If 0 or FALSE, neither tracing nor warnigns are communicated; if 1, only warnigns are communicated, if 2 or 3, warnings and (shorter or longer) tracing information is provided.

...

additional arguments passed to the underlying fitnvmix().

Author

Erik Hintz, Marius Hofert and Christiane Lemieux

Details

The function fitnvmix() uses an ECME algorithm to approximate the MLEs of the parameters nu, loc and scale of a normal variance mixture specified by qmix. The underlying procedure successively estimates nu (with given loc and scale) by maximizing the likelihood which is approximated by dnvmix() (unless qmix is a character string, in which case analytical formulas for the log-densities are used) and scale and loc (given nu) using weights (which again need to be approximated) related to the posterior distribution, details can be found in the first reference below.

It should be highlighted that (unless unless qmix is a character string), every log-likelihood and every weight needed in the estimation is numerically approximated via RQMC methods. For large dimensions and sample sizes this procedure can therefore be slow.

Various tolerances and convergence criteria can be changed by the user via the control argument. For more details, see get_set_param().

References

Hintz, E., Hofert, M. and Lemieux, C. (2021), Normal variance mixtures: Distribution, density and parameter estimation. Computational Statistics and Data Analysis 157C, 107175.

Hintz, E., Hofert, M. and Lemieux, C. (2022), Multivariate Normal Variance Mixtures in R: The R Package nvmix. Journal of Statistical Software, tools:::Rd_expr_doi("10.18637/jss.v102.i02").

McNeil, A. J., Frey, R. and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press.

Liu, C. and Rubin, D. (1994). The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633--648.

Examples

Run this code

## Sampling parameters
set.seed(274) # for reproducibility
nu               <- 2.8 # parameter used to sample data
d                <- 4 # dimension
n                <- 75 # small sample size to have examples run fast
loc              <- rep(0, d) # location vector
A                <- matrix(runif(d * d), ncol = d)
diag_vars        <- diag(runif(d, min = 2, max = 5))
scale            <- diag_vars %*% cov2cor(A %*% t(A)) %*% diag_vars # scale matrix
mix.param.bounds <- c(1, 5) # nu in [1, 5]

### Example 1: Fitting a multivariate t distribution ###########################

if(FALSE){
    ## Define 'qmix' as the quantile function of an IG(nu/2, nu/2) distribution
    qmix <- function(u, nu) 1 / qgamma(1 - u, shape = nu/2, rate = nu/2)
    ## Sample data using 'rnvmix'
    x <- rnvmix(n, qmix = qmix, nu = nu, loc = loc, scale = scale)
    ## Call 'fitvnmix' with 'qmix' as a function (so all densities/weights are estimated)
    (MyFit11 <- fitnvmix(x, qmix = qmix, mix.param.bounds = mix.param.bounds))
    ## Call 'fitnvmix' with 'qmix = "inverse.gamma"' in which case analytical formulas
    ## for weights and densities are used:
    (MyFit12 <- fitnvmix(x, qmix = "inverse.gamma",
                         mix.param.bounds = mix.param.bounds))
    ## Alternatively, use the wrapper 'fitStudent()'
    (MyFit13 <- fitStudent(x))
    ## Check
    stopifnot(all.equal(MyFit11$nu, MyFit12$nu, tol = 5e-2),
              all.equal(MyFit11$nu, MyFit13$nu, tol = 5e-2))
    ## Can also provide 'loc' and 'scale' in which case only 'nu' is estimated
    (MyFit13 <- fitnvmix(x, qmix = "inverse.gamma", mix.param.bounds = mix.param.bounds,
                         loc = loc, scale = scale))
    (MyFit14 <- fitStudent(x, loc = loc, scale = scale))
    stopifnot(all.equal(MyFit13$nu, MyFit14$df, tol = 1e-6))
}

### Example 2: Fitting a Pareto mixture ########################################

## Define 'qmix' as the quantile function of a Par(nu, 1) distribution
qmix <- function(u, nu) (1-u)^(-1/nu)
## Sample data using 'rnvmix':
x <- rnvmix(n, qmix = qmix, nu = nu, loc = loc, scale = scale)
## Call 'fitvnmix' with 'qmix' as function (=> densities/weights estimated)
(MyFit21 <- fitnvmix(x, qmix = qmix, mix.param.bounds = mix.param.bounds))
## Call 'fitnvmix' with 'qmix = "pareto"' in which case an analytical formula
## for the density is used
(MyFit22 <- fitnvmix(x, qmix = "pareto", mix.param.bounds = mix.param.bounds))
stopifnot(all.equal(MyFit21$nu, MyFit22$nu, tol = 5e-2))

Run the code above in your browser using DataLab