Computes and selects the tuning parameter for the sparse and positive definite covariance matrix estimator proposed by Rothman (2012).
pdsoft.cv(x, lam.vec = NULL, standard = TRUE,
init = c("diag", "soft", "dense"), tau = 1e-04,
nsplits = 10, n.tr = NULL, tolin = 1e-08, tolout = 1e-08,
maxitin = 10000, maxitout = 1000, quiet = TRUE)
A data matrix with \(n\) rows and \(p\) columns. The rows are assumed to be a realization of \(n\) independent copies of a \(p\)-variate random vector.
An optional vector of candidate lasso-type penalty tuning parameter values.
The default for standard=TRUE
is seq(from=0, to=1, by=0.05)
and the default for standard=FALSE
is seq(from=0, to=m, length.out=20)
,
where m
is the maximum magnitude of the off-diagonal entries in s
. Both of these default choices
are far from excellent and are time consuming, particularly for values close to zero.
The user should consider refining this set by increasing its resolution in a narrower range.
Logical: standard=TRUE
first computes the observed sample correlation matrix from s
, then
computes the sparse correlation matrix estimate, and finally rescales to return the sparse covariance
matrix estimate. The strongly recommended default is standard=TRUE
.
The type of initialization used for the estimate computed at the maximum element in lam.vec
. Subsequent
initializations use the final iterates for sigma
and omega
at the previous value in lam.vec
.
The default option init="diag"
uses
diagonal starting values. The second option
init="soft"
uses a positive definite version of the soft thresholded
covariance or correlation estimate, depending on standard
. The third option init="dense"
uses the closed-form solution when lam=0
.
The logarithmic barrier parameter. The default is tau=1e-4
, which works well when standard=TRUE
with the default choices for the convergence tolerances.
The number of random splits to use for the tuning parameter selection.
Optional number of cases to use in the training set. The default is the nearest integer to \(n(1-1/\log(n))\). The value must be in \(\{3, \ldots, n-2\}\).
Convergence tolerance for the inner loop of the algorithm that solves the lasso regression.
Convergence tolerance for the outer loop of the algorithm.
Maximum number of inner-loop iterations allowed
Maximum number of outer-loop iterations allowed
Logical: quiet=TRUE
suppresses the printing of progress updates.
A list with
covariance estimate at the selected tuning parameter
inverse covariance estimate at the selected tuning parameter
the selected value of the tuning parameter
a vector of the validation errors, one for each element in lam.vec
the vector of candidate tuning parameter values
the number of cases used for the training set
See Rothman (2012) for the objective function and more information.
Rothman, A. J. (2012). Positive definite estimators of large covariance matrices. Biometrika 99(3): 733-740
# NOT RUN {
set.seed(1)
n=50
p=20
true.cov=diag(p)
true.cov[cbind(1:(p-1), 2:p)]=0.4
true.cov[cbind(2:p, 1:(p-1))]=0.4
eo=eigen(true.cov, symmetric=TRUE)
z=matrix(rnorm(n*p), nrow=n, ncol=p)
x=z%*% tcrossprod(eo$vec*rep(eo$val^(0.5), each=p),eo$vec)
output=pdsoft.cv(x=x)
plot(output$lam.vec, output$cv.err)
output$best.lam
output$sigma
# }
Run the code above in your browser using DataLab