huge.npn(), huge.scr(),huge.subgraph()
sequentially as a pipeline to analyze data.huge(L, ind.group = NULL, lambda = NULL, n.lambda = NULL, lambda.min = NULL,
alpha = 1, sym = "or", npn = TRUE, npn.func = "shrinkage", npn.thresh = NULL,
approx = FALSE, scr = TRUE, scr.num = NULL, verbose = TRUE)
L
: (1) An n
by d
data matrix L
representing n
observations in d
dimensions. (2) A list L
containing L$data
as an
k
vector indexing a subset of all d
variables. Only applicable when estimating a subgraph of the whole graph. The default value is c(1:d)
.approx = FALSE
or Graph Estimation via Correlation Approximation (GECA) when {approx = TRUE}. Typical usage30
if approx = TRUE
and 10
if approx = FALSE
.lambda
, as a fraction of the uppperbound (MAX
) of the regularization/thresholding parameter which makes all estimates equal to 0
. The program can automatically generate lambda
as a1
(lasso). When some dense pattern exists in the graph or some variables are highly correlated, the elastic-net is encouraged for its grouping effect. Only applicable wsym = "and"
, the edge between node i
and node j
is selected only when both node i
and node j
are selected as neighbors for each other. If sym = "or"
npn = TRUE
, the nonparanormal transformation is applied to the input data L
or L$data
. The default value is TRUE
.npn.func = "truncation"
, the truncated ECDF is applied. If npn.func = "shrinkage"
, the shrunken ECDF is applied. The default value is "shrinkage"
npn.func = "truncation"
. The default value is
1/(4*(n^0.25)*sqrt(pi*log(n)))
.approx = FALSE
, GEL is implemented. If approx = TRUE
, GECA is implemented. The defaulty value is approx = FALSE
.scr = TRUE
, the graph screening procedure is applied to preselect the neighborhood before GEL. The default value is TRUE
. Only applicable when approx = FALSE
.scr = TRUE
. The default value is n-1
when p>n
and p-1
(equivalent to disabling graph scrverbose = FALSE
, tracing information printing is disabled. The default value is TRUE
."huge"
is returned:n
by d
data matrix from the inputind.group
from the inputscr.num
by k
matrix with each column correspondsing to a variable in ind.group
and contains the indices of the remaining neighbors after the graph screening. Only applicable when scr = TRUE
and approx = FALSE
alpha
from the input. Only applicable when approx = FALSE
.sym
from the input. Only applicable when approx = FALSE
.npn
from the input.scr
from the input. Only applicable when approx = FALSE
.k and "fullgraph path" when k==d
.
k
by k
adjacency matrices of estimated graphs is returned as the solution path corresponding to lambda
.k
by n.lambda
matrix. Each row is corresponding to a variable in ind.group
and contains all RSS's (Residual Sum of Squares) along the lasso solution path. Only applicable when approx = FALSE
.k
by n.lambda
matrix. Each row corresponds to a variable in ind.group
and contains the number of nonzero coefficients along the lasso solution path. Only applicable when approx = FALSE
.d >> n or d >>k
, the computation is memory optimized and is targeted on larger-sclae problems (with d>3000). We also provide another efficient method, Graph Estimation via Correlation Approximation (GECA).huge.generator
, huge.npn
, huge.scr
, huge.subgraph
, huge.select
, huge.plot
, huge.roc
, lasso.stars
and huge-package
.#generate data
L = huge.generator(n = 200, d = 80, graph = "hub")
#subset indices
ind.group = c(1:50)
#subgraph solution path estimation with input as a list
out1 = huge(L, ind.group = ind.group)
summary(out1)
plot(out1)
plot(out1, align = TRUE)
#subgraph solution path estimation using the correlation graph estimation
out3 = huge(L$data, ind.group = ind.group, approx = TRUE)
summary(out3)
plot(out3)
#fullgraph solution path estimation using elastic net
out4 = huge(L, alpha = 0.7)
summary(out4)
plot(out4)
Run the code above in your browser using DataLab