Gaussian graphical modeling with nonconvex regularization. A thorough survey of these penalties, including simulation studies investigating their properties, is provided in williams2020beyond;textualGGMncv.
ggmncv(
R,
n,
penalty = "atan",
ic = "bic",
select = "lambda",
gamma = NULL,
lambda = NULL,
n_lambda = 50,
lambda_min_ratio = 0.01,
n_gamma = 50,
initial = NULL,
LLA = FALSE,
unreg = FALSE,
maxit = 10000,
thr = 1e-04,
store = TRUE,
progress = TRUE,
ebic_gamma = 0.5,
penalize_diagonal = TRUE,
...
)
Matrix. A correlation matrix of dimensions p by p.
Numeric. The sample size used to compute the information criterion.
Character string. Which penalty should be used (defaults to "atan"
)?
Character string. Which information criterion should be used (defaults to "bic"
)?
The options include aic
, ebic
(ebic_gamma defaults to 0.5
),
ric
, or any of the generalized information criteria provided in section 5 of
kim2012consistent;textualGGMncv. The options are gic_1
(i.e., bic
) to gic_6
(see 'Details
').
Character string. Which tuning parameter should be selected
(defaults to "lambda"
)? The options include "lambda"
(the regularization parameter), "gamma"
(governs the 'shape'),
and "both"
.
Numeric. Hyperparameter for the penalty function.
Defaults to 3.7 (scad
), 2 (mcp
), 0.5 (adapt
),
and 0.01 with all other penalties. Note care must be taken when
departing from the default values
(see the references in 'note
')
Numeric vector. Regularization (or tuning) parameters.
The defaults is NULL
that provides default
values with select = "lambda"
and sqrt(log(p)/n)
with
select = "gamma"
.
Numeric. The number of 's to be evaluated. Defaults to 50.
This is disregarded if custom values are provided for lambda
.
Numeric. The smallest value for lambda
, as a
fraction of the upperbound of the
regularization/tuning parameter. The default is
0.01
, which mimics the R
package
qgraph. To mimic the R
package
huge, set lambda_min_ratio = 0.1
and n_lambda = 10
.
Numeric. The number of 's to be evaluated. Defaults to 50.
This is disregarded if custom values are provided in lambda
.
A matrix (p by p) or custom function that returns
the inverse of the covariance matrix . This is used to compute
the penalty derivative. The default is NULL
, which results
in using the inverse of R
(see 'Note
').
Logical. Should the local linear approximation be used (default to FALSE
)?
Logical. Should the models be refitted (or unregularized) with maximum likelihood
(defaults to FALSE
)? Setting to TRUE
results in the approach of
Foygel2010;textualGGMncv, but with the regularization path obtained from
nonconvex regularization, as opposed to the _1-penalty.
Numeric. The maximum number of iterations for determining convergence of the LLA
algorithm (defaults to 1e4
). Note this can be changed to, say,
2
or 3
, which will provide two and three-step estimators
without convergence check.
Numeric. Threshold for determining convergence of the LLA algorithm
(defaults to 1.0e-4
).
Logical. Should all of the fitted models be saved (defaults to TRUE
)?
Logical. Should a progress bar be included (defaults to TRUE
)?
Numeric. Value for the additional hyper-parameter for the
extended Bayesian information criterion (defaults to 0.5,
must be between 0 and 1). Setting ebic_gamma = 0
results
in BIC.
Logical. Should the diagonal of the inverse covariance
matrix be penalized (defaults to TRUE
).
Additional arguments passed to initial
when a
function is provided and ignored otherwise.
An object of class ggmncv
, including:
Theta
Inverse covariance matrix
Sigma
Covariance matrix
P
Weighted adjacency matrix
adj
Adjacency matrix
lambda
Tuning parameter(s)
fit
glasso fitted model (a list)
Several of the penalties are (continuous) approximations to the _0 penalty, that is, best subset selection. However, the solution does not require enumerating all possible models which results in a computationally efficient solution.
L0 Approximations
Atan: penalty = "atan"
wang2016variableGGMncv.
This is currently the default.
Seamless _0: penalty = "selo"
dicker2013variableGGMncv.
Exponential: penalty = "exp"
wang2018variableGGMncv
Log: penalty = "log"
mazumder2011sparsenetGGMncv.
Sica: penalty = "sica"
lv2009unifiedGGMncv
Additional penalties:
SCAD: penalty = "scad"
fan2001variableGGMncv.
MCP: penalty = "mcp"
zhang2010nearlyGGMncv.
Adaptive lasso (penalty = "adapt"
): Defaults to = 0.5
zou2006adaptiveGGMncv. Note that for consistency with the
other penalties, 0 provides more penalization and
= 1 results in _1 regularization.
Lasso: penalty = "lasso"
tibshirani1996regressionGGMncv.
gamma ():
The gamma
argument corresponds to additional hyperparameter for each penalty.
The defaults are set to the recommended values from the respective papers.
LLA
The local linear approximate is noncovex penalties was described in
fan2009networkGGMncv. This is essentially an iteratively
re-weighted (g)lasso. Note that by default LLA = FALSE
. This is due to
the work of zou2008one;textualGGMncv, which suggested that,
so long as the starting values are good enough, then a one-step estimator is
sufficient to obtain an accurate estimate of the conditional dependence structure.
In the case of low-dimensional data, the sample based inverse
covariance matrix is used for the starting values. This is expected to work well,
assuming that n is sufficiently larger than p.
Generalized Information Criteria
The following are the available GIC:
GIC_1: |E| log(n)
(ic = "gic_1"
or ic = "bic"
)
GIC_2: |E| p^1/3
(ic = "gic_2"
)
GIC_3: |E| 2 log(p)
(ic = "gic_3"
or ic = "ric"
)
GIC_4: |E| 2 log(p) +
log(log(p))
(ic = "gic_4"
)
GIC_5: |E| log(p) +
log(log(n)) log(p)
(ic = "gic_5"
)
GIC_6: |E| log(n)
log(p)
(ic = "gic_6"
)
Note that |E| denotes the number of edges (nonzero relations) in the graph, p the number of nodes (columns), and n the number of observations (rows). Further each can be understood as a penalty term added to negative 2 times the log-likelihood, that is,
-2 l_n() = -2 [n2 log det - tr(S)]
where is the estimated precision matrix (e.g., for a given and ) and S is the sample-based covariance matrix.
# NOT RUN {
# data
Y <- GGMncv::ptsd
S <- cor(Y)
# fit model
# note: atan default
fit_atan <- ggmncv(S, n = nrow(Y),
progress = FALSE)
# plot
plot(get_graph(fit_atan),
edge_magnify = 10,
node_names = colnames(Y))
# lasso
fit_l1 <- ggmncv(S, n = nrow(Y),
progress = FALSE,
penalty = "lasso")
# plot
plot(get_graph(fit_l1),
edge_magnify = 10,
node_names = colnames(Y))
# for these data, we might expect all relations to be positive
# and thus the red edges are spurious. The following re-estimates
# the graph, given all edges positive (sign restriction).
# set negatives to zero (sign restriction)
adj_new <- ifelse( fit_atan$P <= 0, 0, 1)
check_zeros <- TRUE
# track trys
iter <- 0
# iterate until all positive
while(check_zeros){
iter <- iter + 1
fit_new <- constrained(S, adj = adj_new)
check_zeros <- any(fit_new$wadj < 0)
adj_new <- ifelse( fit_new$wadj <= 0, 0, 1)
}
# make graph object
new_graph <- list(P = fit_new$wadj,
adj = adj_new)
class(new_graph) <- "graph"
plot(new_graph,
edge_magnify = 10,
node_names = colnames(Y))
# }
Run the code above in your browser using DataLab