huge.npn(), huge.scr(),huge.subgraph()
sequentially as a pipeline to analyze data.huge(L, ind.group = NULL, lambda = NULL, nlambda = NULL, lambda.min.ratio = 0.1,
alpha = 1, sym = "or", npn = TRUE, npn.func = "shrinkage", npn.thresh = NULL,
approx = FALSE, scr = TRUE, scr.num = NULL, verbose = TRUE)
L
: (1) An n
by d
data matrix L
representing n
observations in d
dimensions. (2) A list L
containing L$data
as an
k
vector indexing a subset of all d
variables. ONLY applicable when estimating a subgraph of the whole graph. The default value is c(1:d)
.approx = FALSE
or Graph Approximation via Correlation Thresholding (GACT) when {approx = TRUE}. Typical usa30
if approx = TRUE
and 10
if approx = FALSE
.lambda
, as a fraction of the uppperbound (MAX
) of the regularization/thresholding parameter which makes all estimates equal to 0
. The program can automatically generate lambda
as a1
(lasso). When some dense pattern exists in the graph or some variables are highly correlated, the elastic-net is encouraged for its grouping effect. ONLY applicable wsym = "and"
, the edge between node i
and node j
is selected ONLY when both node i
and node j
are selected as neighbors for each other. If sym = "or"
npn = TRUE
, the nonparanormal transformation is applied to the input data L
or L$data
. The default value is TRUE
.npn.func = "truncation"
, the truncated ECDF is applied. If npn.func = "shrinkage"
, the shrunken ECDF is applied. The default value is "shrinkage"
npn.func = "truncation"
. The default value is
1/(4*(n^0.25)*sqrt(pi*log(n)))
.approx = FALSE
, GEL is implemented. If approx = TRUE
, GACT is implemented. The defaulty value is approx = FALSE
.scr = TRUE
, the Graph Sure Screening(GSS) is applied to preselect the neighborhood before GEL. The default value is TRUE
for n and FALSE
for n>=d
. ONLY applicable when approx = FA
scr = TRUE
. The default value is n-1
. An alternative value is n/log(n)
. ONLY applicable when scr = TRUE
verbose = FALSE
, tracing information printing is disabled. The default value is TRUE
."huge"
is returned:n
by d
data matrix from the inputind.group
from the inputscr.num
by k
matrix with each column correspondsing to a variable in ind.group
and contains the indices of the remaining neighbors after the GSS. ONLY applicable when scr = TRUE
and approx = FALSE
alpha
from the input. ONLY applicable when approx = FALSE
.sym
from the input. ONLY applicable when approx = FALSE
.npn
from the input.scr
from the input. ONLY applicable when approx = FALSE
.k and "fullgraph path" when k==d
.
k
by k
adjacency matrices of estimated graphs is returned as the solution path corresponding to lambda
.k
by nlambda
matrix. Each row is corresponding to a variable in ind.group
and contains all RSS's (Residual Sum of Squares) along the lasso solution path. ONLY applicable when approx = FALSE
.k
by nlambda
matrix. Each row corresponds to a variable in ind.group
and contains the number of nonzero coefficients along the lasso solution path. ONLY applicable when approx = FALSE
.d >> n or d >>k
, the computation is memory optimized and is targeted on larger-sclae problems (with d>3000). We also provide another efficient method, the GACT.huge.generator
, huge.npn
, huge.scr
, huge.subgraph
, huge.select
, huge.plot
, huge.roc
, lasso.stars
and huge-package
.#generate data
L = huge.generator(n = 200, d = 80, graph = "hub")
ind.group = c(1:50)
#subgraph solution path estimation with input as a list
out1 = huge(L, ind.group = ind.group)
summary(out1)
plot(out1)
plot(out1, align = TRUE)
huge.plot(out1$path[[3]])
plot(out1$lambda,out1$sparsity)
#subgraph solution path estimation using the GACT
out2 = huge(L$data, ind.group = ind.group, approx = TRUE)
summary(out2)
plot(out2)
#fullgraph solution path estimation using elastic net
out3 = huge(L, alpha = 0.3)
summary(out3)
plot(out3)
Run the code above in your browser using DataLab