ggmncv: GGMncv

Description

Gaussian graphical modeling with nonconvex regularization. A thorough survey of these penalties, including simulation studies investigating their properties, is provided in williams2020beyond;textualGGMncv.

Usage

ggmncv(
  R,
  n,
  penalty = "atan",
  ic = "bic",
  select = "lambda",
  gamma = NULL,
  lambda = NULL,
  n_lambda = 50,
  lambda_min_ratio = 0.01,
  n_gamma = 50,
  initial = NULL,
  LLA = FALSE,
  unreg = FALSE,
  maxit = 10000,
  thr = 1e-04,
  store = TRUE,
  progress = TRUE,
  ebic_gamma = 0.5,
  penalize_diagonal = TRUE,
  ...
)

Arguments

Matrix. A correlation matrix of dimensions p by p.

Numeric. The sample size used to compute the information criterion.

penalty

Character string. Which penalty should be used (defaults to "atan")?

Character string. Which information criterion should be used (defaults to "bic")? The options include aic, ebic (ebic_gamma defaults to 0.5), ric, or any of the generalized information criteria provided in section 5 of kim2012consistent;textualGGMncv. The options are gic_1 (i.e., bic) to gic_6 (see 'Details').

select

Character string. Which tuning parameter should be selected (defaults to "lambda")? The options include "lambda" (the regularization parameter), "gamma" (governs the 'shape'), and "both".

gamma

Numeric. Hyperparameter for the penalty function. Defaults to 3.7 (scad), 2 (mcp), 0.5 (adapt), and 0.01 with all other penalties. Note care must be taken when departing from the default values (see the references in 'note')

lambda

Numeric vector. Regularization (or tuning) parameters. The defaults is NULL that provides default values with select = "lambda" and sqrt(log(p)/n) with select = "gamma".

n_lambda

Numeric. The number of 's to be evaluated. Defaults to 50. This is disregarded if custom values are provided for lambda.

lambda_min_ratio

Numeric. The smallest value for lambda, as a fraction of the upperbound of the regularization/tuning parameter. The default is 0.01, which mimics the R package qgraph. To mimic the R package huge, set lambda_min_ratio = 0.1 and n_lambda = 10.

n_gamma

Numeric. The number of 's to be evaluated. Defaults to 50. This is disregarded if custom values are provided in lambda.

initial

A matrix (p by p) or custom function that returns the inverse of the covariance matrix . This is used to compute the penalty derivative. The default is NULL, which results in using the inverse of R (see 'Note').

LLA

Logical. Should the local linear approximation be used (default to FALSE)?

unreg

Logical. Should the models be refitted (or unregularized) with maximum likelihood (defaults to FALSE)? Setting to TRUE results in the approach of Foygel2010;textualGGMncv, but with the regularization path obtained from nonconvex regularization, as opposed to the _1-penalty.

maxit

Numeric. The maximum number of iterations for determining convergence of the LLA algorithm (defaults to 1e4). Note this can be changed to, say, 2 or 3, which will provide two and three-step estimators without convergence check.

thr

Numeric. Threshold for determining convergence of the LLA algorithm (defaults to 1.0e-4).

store

Logical. Should all of the fitted models be saved (defaults to TRUE)?

progress

Logical. Should a progress bar be included (defaults to TRUE)?

ebic_gamma

Numeric. Value for the additional hyper-parameter for the extended Bayesian information criterion (defaults to 0.5, must be between 0 and 1). Setting ebic_gamma = 0 results in BIC.

penalize_diagonal

Logical. Should the diagonal of the inverse covariance matrix be penalized (defaults to TRUE).

...

Additional arguments passed to initial when a function is provided and ignored otherwise.

Value

An object of class ggmncv, including:

Theta Inverse covariance matrix
Sigma Covariance matrix
P Weighted adjacency matrix
adj Adjacency matrix
lambda Tuning parameter(s)
fit glasso fitted model (a list)

Details

Several of the penalties are (continuous) approximations to the _0 penalty, that is, best subset selection. However, the solution does not require enumerating all possible models which results in a computationally efficient solution.

L0 Approximations

Atan: penalty = "atan" wang2016variableGGMncv. This is currently the default.
Seamless _0: penalty = "selo" dicker2013variableGGMncv.
Exponential: penalty = "exp" wang2018variableGGMncv
Log: penalty = "log" mazumder2011sparsenetGGMncv.
Sica: penalty = "sica" lv2009unifiedGGMncv

Additional penalties:

SCAD: penalty = "scad" fan2001variableGGMncv.
MCP: penalty = "mcp" zhang2010nearlyGGMncv.
Adaptive lasso (penalty = "adapt"): Defaults to = 0.5 zou2006adaptiveGGMncv. Note that for consistency with the other penalties, 0 provides more penalization and = 1 results in _1 regularization.
Lasso: penalty = "lasso" tibshirani1996regressionGGMncv.

gamma ():

The gamma argument corresponds to additional hyperparameter for each penalty. The defaults are set to the recommended values from the respective papers.

LLA

The local linear approximate is noncovex penalties was described in fan2009networkGGMncv. This is essentially an iteratively re-weighted (g)lasso. Note that by default LLA = FALSE. This is due to the work of zou2008one;textualGGMncv, which suggested that, so long as the starting values are good enough, then a one-step estimator is sufficient to obtain an accurate estimate of the conditional dependence structure. In the case of low-dimensional data, the sample based inverse covariance matrix is used for the starting values. This is expected to work well, assuming that n is sufficiently larger than p.

Generalized Information Criteria

The following are the available GIC:

GIC_1: |E| log(n) (ic = "gic_1" or ic = "bic")
GIC_2: |E| p^1/3 (ic = "gic_2")
GIC_3: |E| 2 log(p) (ic = "gic_3" or ic = "ric")
GIC_4: |E| 2 log(p) + log(log(p)) (ic = "gic_4")
GIC_5: |E| log(p) + log(log(n)) log(p) (ic = "gic_5")
GIC_6: |E| log(n) log(p) (ic = "gic_6")

Note that |E| denotes the number of edges (nonzero relations) in the graph, p the number of nodes (columns), and n the number of observations (rows). Further each can be understood as a penalty term added to negative 2 times the log-likelihood, that is,

-2 l_n() = -2 [n2 log det - tr(S)]

where is the estimated precision matrix (e.g., for a given and ) and S is the sample-based covariance matrix.

References

Examples

Run this code

# NOT RUN {
# data
Y <- GGMncv::ptsd

S <- cor(Y)

# fit model
# note: atan default
fit_atan <- ggmncv(S, n = nrow(Y),
                   progress = FALSE)

# plot
plot(get_graph(fit_atan),
     edge_magnify = 10,
     node_names = colnames(Y))

# lasso
fit_l1 <- ggmncv(S, n = nrow(Y),
                 progress = FALSE,
                 penalty = "lasso")

# plot
plot(get_graph(fit_l1),
     edge_magnify = 10,
     node_names = colnames(Y))


# for these data, we might expect all relations to be positive
# and thus the red edges are spurious. The following re-estimates
# the graph, given all edges positive (sign restriction).

# set negatives to zero (sign restriction)
adj_new <- ifelse( fit_atan$P <= 0, 0, 1)

check_zeros <- TRUE

# track trys
iter <- 0

# iterate until all positive
while(check_zeros){
  iter <- iter + 1
  fit_new <- constrained(S, adj = adj_new)
  check_zeros <- any(fit_new$wadj < 0)
  adj_new <- ifelse( fit_new$wadj <= 0, 0, 1)
}

# make graph object
new_graph <- list(P = fit_new$wadj,
                  adj = adj_new)
class(new_graph) <- "graph"

plot(new_graph,
     edge_magnify = 10,
     node_names = colnames(Y))

# }

Run the code above in your browser using DataLab