Learn R Programming

huge (version 1.0.3)

huge: High-dimensional undirected graph estimation

Description

The main function for high-dimensional undirected graph estimation. Three graph estimation methods, including Meinshausen & Buhlmann Graph Estimation via Lasso (MBGEL), Graphical Lasso (GLASSO) and Graph Estimation via Correlation Thresholding (GECT), are available for data analysis.

Usage

huge(x, lambda = NULL, nlambda = NULL, lambda.min.ratio = NULL, method = "mbgel", scr = NULL, scr.num = NULL, cov.output = FALSE, sym = "or", verbose = TRUE)

Arguments

x
There are 2 options: (1) x is an n by d data matrix (2) a d by d sample covariance matrix. The program automatically identifies the input matrix by checking the symmetry. (n is
lambda
A sequence of decresing positive numbers to control the regularization in MBGEL and GLASSO, or the thresholding in GECT. Typical usage is to leave the input lambda = NULL and have the program compute its own lambda sequence based
nlambda
The number of regularization/thresholding paramters. The default value is 30 if method = "gect" and 10 if method = "mbgel" or method = "glasso".
lambda.min.ratio
If method = "mbgel" or method = "glasso", it is the smallest value for lambda, as a fraction of the uppperbound (MAX) of the regularization/thresholding parameter which makes all estimates equal to
method
Graph estimation methods with 3 options: "mbgel", "gect" and "glasso". The defaulty value is "mbgel".
scr
If scr = TRUE, Graph Sure Screening (GSS) is applied to preselect the neighborhood before MBGEL. The default value is TRUE for n and FALSE for n>=d. ONLY applicable when method = "mb
scr.num
The neighborhood size after the GSS (the number of remaining neighbors per node). ONLY applicable when scr = TRUE. The default value is n-1. An alternative value is n/log(n). ONLY applicable when scr = TRUE
cov.output
If cov.output = TRUE, the outpu will inlcude a path of estimated covariance matrices. ONLY applicable when method = "glasso". Since the estimated covariance matrices are generally not sparse, please use it with care, or it may ta
sym
Symmetrize the output graphs. If sym = "and", the edge between node i and node j is selected ONLY when both node i and node j are selected as neighbors for each other. If sym = "or"
verbose
If verbose = FALSE, tracing information printing is disabled. The default value is TRUE.

Value

  • An object with S3 class "huge" is returned:
  • dataThe n by d data matrix or d by d sample covariance matrix from the input
  • cov.inputAn indicator of the sample covariance.
  • ind.matThe scr.num by k matrix with each column correspondsing to a variable in ind.group and contains the indices of the remaining neighbors after the GSS. ONLY applicable when scr = TRUE and approx = FALSE
  • lambdaThe sequence of regularization parameters used in MBGEL or thresholding parameters in GECT.
  • symThe sym from the input. ONLY applicable when method = "mbgel".
  • scrThe scr from the input. ONLY applicable when method = "mbgel".
  • pathA list of k by k adjacency matrices of estimated graphs as a graph path corresponding to lambda.
  • sparsityThe sparsity levels of the graph path.
  • wiA list of d by d precision matrices as an alternative graph path (numerical path) corresponding to lambda. ONLY applicable when {method = "glasso"}
  • wA list of d by d estimated covariance matrices corresponding to lambda. ONLY applicable when cov.output = TRUE and {method = "glasso"}
  • methodThe method used in the graph estimation stage.
  • dfIf method = "mbgel", it is a k by nlambda matrix. Each row contains the number of nonzero coefficients along the lasso solution path. If method = "glasso", it is a nlambda dimensional vector containing the number of nonzero coefficients along the graph path wi.
  • loglikA nlambda dimensional vector containing the likelihood scores along the graph path (wi). ONLY applicable when method = "glasso"

Details

The graph stucture is estimated using Meinshausen & Buhlmann Graph Estimation via Lasso (MBGEL) by default and it can be further accelerated by Graph SURE Screening (GSS) subroutine which preselects the graph neighborhood of each variable. In the case d >> n, the computation is memory optimized and is targeted on larger-scale problems. We also provide two alternative approaches for the graph estimation stage:(1) Graph Estimation via Correlation Thresholding (GECT) which is highly efficient and (2) A slightly modified Graphical Lasso (GLASSO) procedure in which the memory usage is optimized using sparse matrix output.

References

1.Tuo Zhao and Han Liu. HUGE: A Package for High-dimensional Undirected Graph Estimation. Technical Report, Carnegie Mellon University, 2010 2.Han Liu, John Lafferty and Larry Wasserman. The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research (JMLR), 2009 3.Jianqing Fan and Jinchi Lv. Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of Royal Statistical Society B, 2008. 4.Jerome Friedman, Trevor Hastie and Rob Tibshirani. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 2008. 5.Onureena Banerjee, Laurent El Ghaoui, Alexandre d'Aspremont: Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data. Journal of Machine Learning Research (JMLR), 2008. 6.Jerome Friedman, Trevor Hastie and Robert Tibshirani. Sparse inverse covariance estimation with the lasso. Biostatistics, 2007. 7.Nicolai Meinshausen and Peter Buhlmann. High-dimensional Graphs and Variable Selection with the Lasso. The Annals of Statistics, 2006.

See Also

huge.generator, huge.select, huge.plot, huge.roc, and huge-package.

Examples

Run this code
#generate data
L = huge.generator(n = 200, d = 80, graph = "hub")

#graph path estimation using MBGEL
out1 = huge(L$data)
out1
plot(out1)				 #Not aligned	
plot(out1, align = TRUE) #Aligned
huge.plot(out1$path[[3]])

#graph path estimation using the sample covariance matrix as the input.
out1 = huge(cor(L$data))
out1
plot(out1)				 #Not aligned	
plot(out1, align = TRUE) #Aligned
huge.plot(out1$path[[3]])

#graph path estimation using GECT
out2 = huge(L$data,method = "gect")
out2
plot(out2)

#graph path estimation using GLASSO
out3 = huge(L$data, method = "glasso")
out3
plot(out3)

Run the code above in your browser using DataLab