Learn R Programming

huge (version 0.9.1)

huge.scr: Graph Sure Screening (GSS) and Graph Approximation via Correlation Thresholding (GACT)

Description

Implements (1) the Graph Sure Screening (GSS) which preselects neighborhoods by thresholding sample correlation and (2) the Graph Approximation via Correlation Thresholding (GACT) which approximate the graph by thresholding sample correaltion.

Usage

huge.scr(x, ind.group = NULL, scr.num = NULL, method = "GSS", nlambda = 30, 
lambda.min.ratio = 0.1, lambda = NULL, verbose = TRUE)

Arguments

x
The n by d data matrix representing n observations in d dimensions
ind.group
A k dimensional vector indexing a subset of all d variables. ONLY applicable when estimating a subgraph of the whole graph. The default value is c(1:d).
scr.num
The neighborhood size after the GSS (the number of remaining neighbors). The default value is n-1 when d>n and d-1 (equivalent to disabling the GSS) when n>=d. An alternative value is n/log(n)
method
There are 2 options, "GSS" and "GACT" The default value is "GSS".
lambda
A sequence of decresing positive numbers to control the thresholding in the GACT. Typical usage is to leave the input lambda = NULL and have the program compute its own lambda sequence based on nlambda and lamb
nlambda
The number of thresholding paramters. The default value is 30.ONLY application when method = "GACT".
lambda.min.ratio
The largest sparsity level for estimated graphs. The program can automatically generate lambda as a sequence of length = nlambda, which makes the sparsity level of the solution path increases from 0 to lambda.m
verbose
If verbose = FALSE, printing the tracing information is disabled. The default value is TRUE.

Value

  • An object with S3 class "scr" is returned:
  • pathThe solution path generated by the GACT corresponding to the sequence of thresholding paramters lambda. ONLY applicable when approx = TRUE.
  • lambdaThe sequence of thresholding parameters used in the GACT. ONLY applicable when approx = TRUE.
  • sparsityThe sparsity levels of the solution path. ONLY applicable when approx = TRUE.
  • methodThe method indicator from the input
  • ind.matA scr.num by k matrix is returned. Each column corresponds to a variable in ind.group and contains the indices of the remaining neighbors after the graph screening. ONLY applicable when approx = FALSE.

Details

The GSS is applied to preselect the neighborhood under ultrahigh-dimensional setting before Meinshausen & Buhlmann Graph Estimation via Lasso (GEL). With the dimensionality reduced from ultra-high to a medium level (usually below the sample size), variable selection can be accomplished by some refined varaible selection method such as the Lasso, the elastic-net. The GSS can greatly reduce the computational burden and often achieves equally or better estimation without using the GSS. Under the assumption of sparsity, the GACT is the most efficient way to study the underlying structure of Gaussian graphical models. As an approximation of paritial correlation graph estimation, the GACT also performs well. It can generate some dense graphs, while the sparsity level generated by L1 regularization methods usually depends on the sample size.

References

1.Tuo Zhao and Han Liu. HUGE: A Package for High-dimensional Undirected Graph Estimation. Technical Report, Carnegie Mellon University, 2010 2.Jianqing Fan and Jinchi Lv. Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of Royal Statistical Society B, 2008. 3.Jerome Friedman, Trevor Hastie and Rob Tibshiran. Applications of the lasso and grouped lasso to the estimation of sparse graphical models, Technical Report, Stanford University, 2010

See Also

huge and huge-package

Examples

Run this code
# generate data
L = huge.generator(graph = "hub", g = 5)
ind.group = c(1:30)

# the Graph Sure Screening (GSS)
out.scr = huge.scr(L$data, ind.group = ind.group)
summary(out.scr)

# the Graph Approximation via Correlation Threholding (GACT)
out.approx = huge.scr(L$data, method = "GACT", nlambda = 20)
summary(out.approx)
plot(out.approx)

out.scr = huge.scr(L$data, ind.group = ind.group, method = "GACT")
huge.plot(out.scr$path[[15]])

Run the code above in your browser using DataLab