Learn R Programming

DESP (version 0.2)

desp: Robust estimation of a sparse precision matrix

Description

This function computes robust estimates of the precision matrix $Omega$ and of the expectation $mu$ corresponding to the data matrix X by solving a convex optimization problem. It implements the algorithms proposed by Balmand and Dalalyan (2015a, 2015b).

Usage

desp(X,lambda=0,gamma=0,settings=NULL)

Arguments

X
The data matrix.
lambda
The penalization parameter that encourages robustness.
gamma
The penalization parameter that promotes sparsity.
settings
A list including all the parameters needed for the estimation.
nThreads
number of threads for parallel processing, default 1

thres
threshold used as a stopping criterion, the computation stops when the variation of the objective function is below thres, default 1e-6

maxIter
maximum number of iterations, default 2000

diagElem
estimator of the diagonal entries, default 'RV' (residual variance). The other available estimators

OLS
whether to re-estimate the coefficients of regression by ordinary least squares on selected variables only or not, default TRUE

OLS.min
the number of observations detected as inliers below which the coefficients of regression are not re-estimated, used only when OLS is TRUE, half of the sample size by default

refit
whether to re-estimate the coefficients of regression after having removed the detected outliers, default TRUE

refit.min
the number of observations detected as inliers below which the coefficients of regression are not re-estimated, used only when refit is TRUE, half of the sample size by default

sqrLasso
method used to solve the square-root Lasso problem, default 'CD'

sto
whether a randomized algorithm (stochastic coordinate descent) have to be used when choosing the coordinate descent method. By default, this parameter is set to '0', that means that the coordinates are updated in the order in which the corresponding variables appear in X. Another option would be '2', the coordinates are all updated but in a uniformly random order. The last option (experimental) would be '1', in this case the sole coordinate to be updated is chosen uniformly at random at each iteration.

outParCorrB
whether the partial correlation matrix and the matrix of the coefficients of regression must be outputted or not, default FALSE

posDef
whether to ensure that the precision matrix is positive definite or not, default TRUE

SML.method
when diagElem is 'SML', the method to choose the path between two vertices, default 'MST'

SPT.spec
when diagElem is 'SML' and when SML.method is 'SPT', the specifications to build the shortest path trees for each connected component, default 1

parameters to be used when diagElem is 'PML' :
PML.thresh
threshold level, default 0.05

PML.kappa
tunning paramater, default 0.05

PML.tol
gradient magnitude tolerance, default 1e-5

rmSO
whether the outliers characterized by a very large Euclidean norm have to be excluded first, default TRUE

rmSO.method
method used to detect rough outliers: either based on interquantile range (IQR), or median absolute deviation, used only when rmSO is TRUE, default 'MAD'

rmSO.mad.mult
multiplicative factor associated to MAD, default 5

rmSO.mad.constant
scale constant associated to MAD, default 1.4826

rmSO.iqr.mult
multiplicative factor associated to IQR, default 3

Value

desp returns an object with S3 class "desp" containing the estimated parameters, with components:
Omega
The precision matrix.
mu
The expectation vector.
Theta
The matrix corresponding to outliers.
Psi
The matrix of partial correlations, optional.
B
The matrix of the coefficients of regression, optional.
simpleOut
The outliers characterized by a very large Euclidean norm, optional. This observations have been excluded and do not appear in Theta

References

Balmand, S. and Dalalyan, A. S. (2015a): On estimation of the diagonal elements of a sparse precision matrix, Posted as http://arxiv.org/abs/1504.04696.

Balmand, S. and Dalalyan, A. S. (2015b): Convex programming approach to robust estimation of a multivariate Gaussian model, Posted as http://arxiv.org/abs/1512.04734.

See Also

sqR_Lasso

Examples

Run this code
## set the dimension and the sample size
p <- 50
n <- 40
## define the true precision matrix
Omega <- 2*diag(p)
Omega[1,1] <- p 
Omega[1,2:p] <- 2/sqrt(2)
Omega[2:p,1] <- 2/sqrt(2)
## generate the distribution
set.seed(1)
X <- MASS::mvrnorm(n, rep.int(0,p), MASS::ginv(Omega))
## define the settings
settings <- list("diagElem"="AD", "OLS"=FALSE)
## estimate the parameters of the distribution
params <- desp(X,lambda=0,gamma=sqrt(2*log(p)),settings=settings)
## error of estimation measured by Frobenius norm
sqrt(sum((Omega - params$Omega)^2))
## increase the sample size and generate the distribution again
n <- 1000
X <- MASS::mvrnorm(n, rep.int(0,p), MASS::ginv(Omega))
## estimate the parameters of the distribution
params <- desp(X,lambda=0,gamma=sqrt(2*log(p)),settings=settings)
## error of estimation measured by Frobenius norm
sqrt(sum((Omega - params$Omega)^2))

Run the code above in your browser using DataLab