Learn R Programming

huge (version 0.9.1)

huge.select: Model selection for high-dimensional undirected graph estimation

Description

Implements the regularization parameter selection for high dimensional undirected graph estimation. The optional approaches are Permutational Information Criterion (PIC), Stability Approach to Regularization Selection (StARS) and Extended Bayesian Information Criterion(EBIC).

Usage

huge.select(est, criterion = NULL, r.num = 200, EBIC.gamma = 0.5, 
stars.thresh = 0.1, stars.subsample.ratio = NULL, stars.rep.num = 20, 
verbose = TRUE)

Arguments

est
An object with S3 class "huge" (output from huge)
criterion
Model selection criterion. For Meinshausen & Buhlmann Graph Estimation via Lasso (GEL), all 3 options are available, "PIC", "EBIC" and "stars". For Graph Approximation via Correlation Thresholding (GACT), "star
r.num
The number of random permutations in PIC selection. The default value is 200. ONLY applicable when criterion = "PIC"
EBIC.gamma
The tuning parameter for the EBIC. The default value is 0.5. Only applicable when est$method = "GEL" or est$method = "GLASSO" and criterion = "EBIC".
stars.thresh
The variability threshold in the StARS. The default value is 0.1. An alternative value is 0.05. Only applicable when criterion = "stars".
stars.subsample.ratio
The subsampling ratio. The default value is 10*sqrt(n)/n when n>144 and 0.8 when n<=144< code="">, where n is the sample size. Only applicable when criterion = "stars".
stars.rep.num
The number of subsampling for the StARS. The default value is 20.Only applicable when criterion = "stars"
verbose
If verbose = FALSE, tracing information printing is disabled. The default value is TRUE.

Value

  • An object with S3 class "select" is returned:
  • refitThe optimal graph selected from the solution path
  • mergeThe solution path estimated by merging the subsampling paths. Only applicable when the input criterion = "stars".
  • variabilityThe variability along the subsampling paths. Only applicable when the input criterion = "stars".
  • EBIC.scoresExtended BIC scores for regularization parameter selection. Only applicable when criterion = "EBIC".
  • opt.indexThe index of the selected regularization parameter. NOT applicable when the input criterion = "PIC"
  • opt.lambdaThe selected regularization/thresholding parameter.
  • opt.sparsityThe sparsity level of "refit".
  • graphreturn "subgraph" when k and "fullgraph" when k==d
  • and anything else inluded in the input est

Details

The StARS is a natural way to select optimal regularization parameter for all three estimation methods. It selects the optimal graph by variability of subsamplings and tends to overselect edges in Gaussian graphical models. Besides selecting the regularization parameters, the StARS can also provide an additional estimated graph by merging the corresponding subsampled graphs using the frequency counts. The subsampling procedure in StARS may NOT be very efficient, we also proved the recent developed highly efficient PIC approach. Instead of tuning over a grid by cross-validation or subsampling, we directly estimate the optimal regularization paramter based on random permutations. However, the PIC usually has very good empirical performances but suffers from underselections sometimes. Therefore, we suggest if user are sensitive of false negative rates, they should either consider increasing r.num or applying the StARS to model selection. The extended BIC is another competive approach, but the EBIC.gamma can only be tuned by experience. The EBIC score for the GEL is based on pseudo-likelihood and the theoretical properties have NOT been justified yet.

References

1. Tuo Zhao and Han Liu. HUGE: A Package for High-dimensional Undirected Graph Estimation. Technical Report, Carnegie Mellon University, 2010 2. Han Liu, Kathryn Roeder and Larry Wasserman. Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. Advances in Neural Information Processing Systems (NIPS), 2010. Jiahua Chen and Zehua Chen. Extended Bayesian information criterion for model selection with large model space. Biometrika, 2008.

See Also

huge and huge-package.

Examples

Run this code
#generate data
L = huge.generator(d = 200, graph="hub")
out.GEL = huge(L)
out.GACT = huge(L, method = "GACT")
out.GLASSO = huge(L, method = "GLASSO")

#model selection using PIC
out.select = huge.select(out.GEL)
summary(out.select)
plot(out.select)

#model selection using stars
out.select = huge.select(out.GACT, stars.rep.num = 5)
summary(out.select)
plot(out.select)

#model selection using EBIC
out.select = huge.select(out.GLASSO)
summary(out.select)
plot(out.select)

Run the code above in your browser using DataLab