huge.scr: Graph screening and Graph Estimation via Correlation Approximation function

Description

Implements (1) The neighborhood preselection by thresholding sample correlation (graph screening) and (2) The graph estimation by thresholding sample correaltion (Graph Estimation via Correlation Approximation)

Usage

huge.scr(x, ind.group = NULL, scr.num = NULL, approx = FALSE, n.lambda = 30, 
lambda.min = 1e-4, lambda = NULL, verbose = TRUE)

Arguments

The n by d data matrix representing n observations in d dimensions

ind.group

A k dimensional vector indexing a subset of all d variables. Only applicable when estimating a subgraph of the whole graph. The default value is c(1:d).

scr.num

The neighborhood size after the graph screening (the number of remaining neighbors). The default value is n-1 when d>n and d-1 (equivalent to disabling graph screening procedure) when n>=d. An alternativ

approx

If approx = FALSE, the graph screening procedure is implemented. If approx = TRUE, Graph Estimation via Correlation Approximation (GECA) is implemented. The defaulty value is approx = FALSE.

lambda

A sequence of decresing positive numbers to control the thresholding in GECA. Typical usage is to leave the input lambda = NULL and have the program compute its own lambda sequence based on n.lambda and lambda.

n.lambda

The number of thresholding paramters. The default value is 30.Only application when approx = TRUE.

lambda.min

The smallest value for lambda, as a fraction of the uppperbound (MAX) of the thresholding parameter which makes all estimates equal to 0. The program can automatically generate lambda as a sequence of le

verbose

If verbose = FALSE, printing the tracing information is disabled. The default value is TRUE.

Value

An object with S3 class "scr" is returned:
pathThe solution path generated by GECA corresponding to the sequence of thresholding paramters lambda. Only applicable when approx = TRUE.
lambdaThe sequence of thresholding parameters used in GECA. Only applicable when approx = TRUE.
sparsityThe sparsity levels of the solution path. Only applicable when approx = TRUE.
approxThe GECA indicator from the input
ind.matA scr.num by k matrix is returned. Each column corresponds to a variable in ind.group and contains the indices of the remaining neighbors after the graph screening. Only applicable when approx = FALSE.

Details

The graph screening procedure is applied to preselect the neighborhood under ultrahigh-dimensional setting before covariance selection using lasso. With the dimensionality reduced from ultra-high to a medium level (usually below the sample size), variable selection can be accomplished by some refined varaible selection method such as the Lasso, elastic-net. Graph screening can greatly reduce the computational burden and often achieves almost equally or better covariance selection without using screening. Under the assumption of sparsity, Graph Estimation via Correlation Approximation (GECA) is the most efficient way to study the underlying structure of Gaussian graphical models. As an approximation of paritial correlation graph estimation, GECA also performs well. It can generate some dense graphs, while the sparsity level generated by L1 regularization methods usually depends on the sample size.

References

Tuo Zhao and Han Liu. HUGE: A Package for High-dimensional Undirected Graph Estimation. Technical Report, Carnegie Mellon University, 2010 Jianqing Fan and Jinchi Lv. Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of Royal Statistical Society B, Vol.70, Page 849-91sss1, 2008. Jerome Friedman, Trevor Hastie and Rob Tibshiran. Applications of the lasso and grouped lasso to the estimation of sparse graphical models, Technical Report, Stanford University, 2010

Examples

Run this code

#generate data
n = 50
L = huge.generator(n = n, d = 100, graph = "hub")

#subset indices
ind.group = c(1:40)

#graph screening for a subset of variables
out.scr = huge.scr(L$data, ind.group = ind.group)
summary(out.scr)

#graph screening using alternative neighborhood size
scr.num = n/log(n)
ind.mat = huge.scr(L$data, scr.num = scr.num)$ind.mat

#GECA
out.approx = huge.scr(L$data, approx = TRUE, n.lambda = 10)
summary(out.approx)
plot(out.approx)

Run the code above in your browser using DataLab