huge.npn(), huge.scr(),huge.subgraph() sequentially as a pipeline to analyze data.huge(L, ind.group = NULL, lambda = NULL, n.lambda = NULL, lambda.min = NULL,
alpha = 1, sym = "or", npn = TRUE, npn.func = "shrinkage", npn.thresh = NULL,
approx = FALSE, scr = TRUE, scr.num = NULL, verbose = TRUE)L: (1) An n by d data matrix L representing n observations in d dimensions. (2) A list L containing L$data as an k vector indexing a subset of all d variables. Only applicable when estimating a subgraph of the whole graph. The default value is c(1:d).approx = FALSE or Graph Estimation via Correlation Approximation (GECA) when {approx = TRUE}. Typical usage30 if approx = TRUE and 10 if approx = FALSE.lambda, as a fraction of the uppperbound (MAX) of the regularization/thresholding parameter which makes all estimates equal to 0. The program can automatically generate lambda as a1 (lasso). When some dense pattern exists in the graph or some variables are highly correlated, the elastic-net is encouraged for its grouping effect. Only applicable wsym = "and", the edge between node i and node j is selected only when both node i and node j are selected as neighbors for each other. If sym = "or"npn = TRUE, the nonparanormal transformation is applied to the input data L or L$data. The default value is TRUE.npn.func = "truncation", the truncated ECDF is applied. If npn.func = "shrinkage", the shrunken ECDF is applied. The default value is "shrinkage"npn.func = "truncation". The default value is
1/(4*(n^0.25)*sqrt(pi*log(n))).approx = FALSE, GEL is implemented. If approx = TRUE, GECA is implemented. The defaulty value is approx = FALSE.scr = TRUE, the graph screening procedure is applied to preselect the neighborhood before GEL. The default value is TRUE. Only applicable when approx = FALSE.scr = TRUE. The default value is n-1 when p>n and p-1 (equivalent to disabling graph scrverbose = FALSE, tracing information printing is disabled. The default value is TRUE."huge" is returned:n by d data matrix from the inputind.group from the inputscr.num by k matrix with each column correspondsing to a variable in ind.group and contains the indices of the remaining neighbors after the graph screening. Only applicable when scr = TRUE and approx = FALSEalpha from the input. Only applicable when approx = FALSE.sym from the input. Only applicable when approx = FALSE.npn from the input.scr from the input. Only applicable when approx = FALSE.k and "fullgraph path" when k==d. k by k adjacency matrices of estimated graphs is returned as the solution path corresponding to lambda.k by n.lambda matrix. Each row is corresponding to a variable in ind.group and contains all RSS's (Residual Sum of Squares) along the lasso solution path. Only applicable when approx = FALSE.k by n.lambda matrix. Each row corresponds to a variable in ind.group and contains the number of nonzero coefficients along the lasso solution path. Only applicable when approx = FALSE.d >> n or d >>k, the computation is memory optimized and is targeted on larger-sclae problems (with d>3000). We also provide another efficient method, Graph Estimation via Correlation Approximation (GECA).huge.generator, huge.npn, huge.scr, huge.subgraph, huge.select, huge.plot, huge.roc, lasso.stars and huge-package.#generate data
L = huge.generator(n = 200, d = 80, graph = "hub")
#subset indices
ind.group = c(1:50)
#subgraph solution path estimation with input as a list
out1 = huge(L, ind.group = ind.group)
summary(out1)
plot(out1)
plot(out1, align = TRUE)
#subgraph solution path estimation using the correlation graph estimation
out3 = huge(L$data, ind.group = ind.group, approx = TRUE)
summary(out3)
plot(out3)
#fullgraph solution path estimation using elastic net
out4 = huge(L, alpha = 0.7)
summary(out4)
plot(out4)Run the code above in your browser using DataLab