Perform variable selection via the LASSO, best subsets selection, forward selection, backward selection, or sequential replacement on unmoderated networks. Or, perform variable selection via the hierarchical LASSO for moderated networks. Can be used for both GGMs and SUR networks.
varSelect(
data,
m = NULL,
criterion = "AIC",
method = "glmnet",
lags = NULL,
exogenous = TRUE,
type = "g",
center = TRUE,
scale = FALSE,
gamma = 0.5,
nfolds = 10,
varSeed = NULL,
useSE = TRUE,
nlam = NULL,
covs = NULL,
verbose = TRUE,
beepno = NULL,
dayno = NULL
)
n x k
dataframe or matrix.
Character vector or numeric vector indicating the moderator(s), if
any. Can also specify "all"
to make every variable serve as a
moderator, or 0
to indicate that there are no moderators. If the
length of m
is k - 1
or longer, then it will not be possible
to have the moderators as exogenous variables. Thus, exogenous
will
automatically become FALSE
.
The criterion for the variable selection procedure. Options
include: "cv", "aic", "bic", "ebic", "cp", "rss", "adjr2", "rsq",
"r2"
. "CV"
refers to cross-validation, the information criteria are
"AIC", "BIC", "EBIC"
, and "Cp"
, which refers to Mallow's Cp.
"RSS"
is the residual sum of squares, "adjR2"
is adjusted
R-squared, and "Rsq"
or "R2"
is R-squared. Capitalization is
ignored. For methods based on the LASSO, only "CV", "AIC", "BIC",
"EBIC"
are available. For methods based on subset selection, only
"Cp", "BIC", "RSS", "adjR2", "R2"
are available.
Character string to indicate which method to use for variable
selection. Options include "lasso"
and "glmnet"
, both of
which use the LASSO via the glmnet
package (either with
glmnet::glmnet
or
glmnet::cv.glmnet
, depending upon the
criterion). "subset", "backward", "forward", "seqrep"
, all call
different types of subset selection using the
leaps::regsubsets
function. Finally
"glinternet"
is used for applying the hierarchical lasso, and is the
only method available for moderated network estimation (either with
glinternet::glinternet
or
glinternet::glinternet.cv
,
depending upon the criterion). If one or more moderators are specified,
then method
will automatically default to "glinternet"
.
Numeric or logical. Can only be 0, 1 or TRUE
or
FALSE
. NULL
is interpreted as FALSE
. Indicates whether
to fit a time-lagged network or a GGM.
Logical. Indicates whether moderator variables should be
treated as exogenous or not. If they are exogenous, they will not be
modeled as outcomes/nodes in the network. If the number of moderators
reaches k - 1
or k
, then exogenous
will automatically
be FALSE
.
Determines whether to use gaussian models "g"
or binomial
models "c"
. Can also just use "gaussian"
or
"binomial"
. Moreover, a vector of length k
can be provided
such that a value is given to every variable. Ultimately this is not
necessary, though, as such values are automatically detected.
Logical. Determines whether to mean-center the variables.
Logical. Determines whether to standardize the variables.
Numeric value of the hyperparameter for the "EBIC"
criterion. Only relevant if criterion = "EBIC"
. Recommended to use a
value between 0 and .5, where larger values impose a larger penalty on the
criterion.
Only relevant if criterion = "CV"
. Determines the number
of folds to use in cross-validation.
Numeric value providing a seed to be set at the beginning of the selection procedure. Recommended for reproducible results.
Logical. Only relevant if method = "glinternet"
and
criterion = "CV"
. Indicates whether to use the standard error of the
estimates across folds, if TRUE
, or to use the standard deviation,
if FALSE
.
if method = "glinternet"
, determines the number of lambda
values to evaluate in the selection path.
Numeric or character string indicating a variable to be used as a covariate. Currently not working properly.
Logical. Determines whether to provide output to the console about the status of the procedure.
Character string or numeric value to indicate which variable
(if any) encodes the survey number within a single day. Must be used in
conjunction with dayno
argument.
Character string or numeric value to indicate which variable (if
any) encodes the survey number within a single day. Must be used in
conjunction with beepno
argument.
List of all models, with the selected variables for each along with
model coefficients and the variable selection models themselves. Primarily
for use as input to the type
argument of the
fitNetwork
function.
The primary value of the output is to be used as input when fitting the
selected model with the fitNetwork
function. Specifically, the
output of varSelect
can be assigned to the type
argument
of fitNetwork
in order to fit the constrained models that were
selected across nodes.
For moderated networks, the only variable selection approach available is
through the glinternet
package, which implements the hierarchical
LASSO. The criterion for model selection dictates which function from the
package is used, where information criteria use the
glinternet::glinternet
function to
compute models, and cross-validation calls the
glinternet::glinternet.cv
function.
resample, fitNetwork, bootNet,
mlGVAR, glinternet::glinternet,
glinternet::glinternet.cv,
glmnet::glmnet,
glmnet::cv.glmnet,
leaps::regsubsets
# NOT RUN {
vars1 <- varSelect(ggmDat, criterion = 'BIC', method = 'subset')
fit1 <- fitNetwork(ggmDat, type = vars1)
vars2 <- varSelect(ggmDat, criterion = 'CV', method = 'glmnet')
fit2 <- fitNetwork(ggmDat, type = vars2, which.lam = 'min')
# Add a moderator
vars3 <- varSelect(ggmDat, m = 'M', criterion = 'EBIC', gamma = .5)
fit3 <- fitNetwork(ggmDat, moderators = 'M', type = vars3)
# }
Run the code above in your browser using DataLab