cv_regsem: The main function that runs multiple penalty values.

Description

The main function that runs multiple penalty values.

Usage

cv_regsem(
  model,
  n.lambda = 40,
  pars_pen = "regressions",
  metric = ifelse(fit.ret2 == "train", "BIC", "chisq"),
  mult.start = FALSE,
  multi.iter = 10,
  jump = 0.01,
  lambda.start = 0,
  alpha = 0.5,
  gamma = 3.7,
  type = "lasso",
  random.alpha = 0.5,
  fit.ret = c("rmsea", "BIC", "chisq"),
  fit.ret2 = "train",
  n.boot = 20,
  data = NULL,
  optMethod = "rsolnp",
  gradFun = "ram",
  hessFun = "none",
  test.cov = NULL,
  test.n.obs = NULL,
  prerun = FALSE,
  parallel = FALSE,
  ncore = 2,
  Start = "lavaan",
  subOpt = "nlminb",
  diff_par = NULL,
  LB = -Inf,
  UB = Inf,
  par.lim = c(-Inf, Inf),
  block = TRUE,
  full = TRUE,
  calc = "normal",
  max.iter = 2000,
  tol = 1e-05,
  round = 3,
  solver = FALSE,
  quasi = FALSE,
  solver.maxit = 5,
  alpha.inc = FALSE,
  step = 0.1,
  momentum = FALSE,
  step.ratio = FALSE,
  line.search = FALSE,
  nlminb.control = list(),
  warm.start = FALSE,
  missing = "listwise",
  verbose = TRUE,
  ...
)

Value

parameters Matrix of parameter estimates across the penalties

fits Fit metrics across penalties

final_pars Parameter estimates from the best fitting model according to metric

pars_pen Parameter indicators that were penalized.

df Degrees of freedom

metric The fit function used to choose a final model

call

Arguments

model: Lavaan output object. This is a model that was previously run with any of the lavaan main functions: cfa(), lavaan(), sem(), or growth(). It also can be from the efaUnrotate() function from the semTools package. Currently, the parts of the model which cannot be handled in regsem is the use of multiple group models, missing other than listwise, thresholds from categorical variable models, the use of additional estimators other than ML, most notably WLSMV for categorical variables. Note: the model does not have to actually run (use do.fit=FALSE), converge etc... regsem() uses the lavaan object as more of a parser and to get sample covariance matrix.
n.lambda: number of penalization values to test.
pars_pen: Parameter indicators to penalize. There are multiple ways to specify. The default is to penalize all regression parameters ("regressions"). Additionally, one can specify all loadings ("loadings"), or both c("regressions","loadings"). Next, parameter labels can be assigned in the lavaan syntax and passed to pars_pen. See the example.Finally, one can take the parameter numbers from the A or S matrices and pass these directly. See extractMatrices(lav.object)$A.
metric: Which fit index to use to choose a final model? Note that it chooses the best fit that also achieves convergence (conv=0).
mult.start: Logical. Whether to use multi_optim() (TRUE) or regsem() (FALSE).
multi.iter: maximum number of random starts for multi_optim
jump: Amount to increase penalization each iteration.
lambda.start: What value to start the penalty at
alpha: Mixture for elastic net. 1 = ridge, 0 = lasso
gamma: Additional penalty for MCP and SCAD
type: Penalty type. Options include "none", "lasso", "ridge", "enet" for the elastic net, "alasso" for the adaptive lasso and "diff_lasso". diff_lasso penalizes the discrepency between parameter estimates and some pre-specified values. The values to take the deviation from are specified in diff_par. Two methods for sparser results than lasso are the smooth clipped absolute deviation, "scad", and the minimum concave penalty, "mcp". Last option is "rlasso" which is the randomised lasso to be used for stability selection.
random.alpha: Alpha parameter for randomised lasso. Has to be between 0 and 1, with a default of 0.5. Note this is only used for "rlasso", which pairs with stability selection.
fit.ret: Fit indices to return.
fit.ret2: Return fits using only dataset "train" or bootstrap "boot"? Have to do 2 sample CV manually.
n.boot: Number of bootstrap samples if fit.ret2="boot"
data: Optional dataframe. Only required for missing="fiml".
optMethod: Solver to use. Two main options for use: rsoolnp and coord_desc. Although slightly slower, rsolnp works much better for complex models. coord_desc uses gradient descent with soft thresholding for the type of of penalty. Rsolnp is a nonlinear solver that doesn't rely on gradient information. There is a similar type of solver also available for use, slsqp from the nloptr package. coord_desc can also be used with hessian information, either through the use of quasi=TRUE, or specifying a hess_fun. However, this option is not recommended at this time.
gradFun: Gradient function to use. Recommended to use "ram", which refers to the method specified in von Oertzen & Brick (2014). Only for use with optMethod="coord_desc".
hessFun: hessian function to use. Currently not recommended.
test.cov: Covariance matrix from test dataset. Necessary for CV=T
test.n.obs: Number of observations in test set. Used when CV=T
prerun: Logical. Use rsolnp to first optimize before passing to gradient descent? Only for use with coord_desc
parallel: Logical. whether to parallelize the processes running models for all values of lambda.
ncore: Number of cores to use when parallel=TRUE
Start: type of starting values to use.
subOpt: type of optimization to use in the optimx package.
diff_par: parameter values to deviate from.
LB: lower bound vector.
UB: upper bound vector
par.lim: Vector of minimum and maximum parameter estimates. Used to stop optimization and move to new starting values if violated.
block: Whether to use block coordinate descent
full: Whether to do full gradient descent or block
calc: Type of calc function to use with means or not. Not recommended for use.
max.iter: Number of iterations for coordinate descent
tol: Tolerance for coordinate descent
round: Number of digits to round results to
solver: Whether to use solver for coord_desc
quasi: Whether to use quasi-Newton
solver.maxit: Max iterations for solver in coord_desc
alpha.inc: Whether alpha should increase for coord_desc
step: Step size
momentum: Momentum for step sizes
step.ratio: Ratio of step size between A and S. Logical
line.search: Use line search for optimization. Default is no, use fixed step size
nlminb.control: list of control values to pass to nlminb
warm.start: Whether start values are based on previous iteration. This is not recommended.
missing: How to handle missing data. Current options are "listwise" and "fiml".
verbose: Print progress bar?
...: Any additional arguments to pass to regsem() or multi_optim().

Examples

Run this code

# \donttest{
library(regsem)
# put variables on same scale for regsem
HS <- data.frame(scale(HolzingerSwineford1939[,7:15]))
mod <- '
f =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9
'
outt = cfa(mod, HS)
# increase to > 25
cv.out = cv_regsem(outt,type="lasso", pars_pen=c(1:2,6:8),
          n.lambda=5,jump=0.01)
# check parameter numbers
extractMatrices(outt)["A"]
# equivalent to
mod <- '
f =~ 1*x1 + l1*x2 + l2*x3 + l3*x4 + l4*x5 + l5*x6 + l6*x7 + l7*x8 + l8*x9
'
outt = cfa(mod,HS)
# increase to > 25
cv.out = cv_regsem(outt, type="lasso", pars_pen=c("l1","l2","l6","l7","l8"),
         n.lambda=5,jump=0.01)
summary(cv.out)
plot(cv.out, show.minimum="BIC")

mod <- '
f =~ x1 + x2 + x3 + x4 + x5 + x6
'
outt = cfa(mod, HS)
# can penalize all loadings
cv.out = cv_regsem(outt,type="lasso", pars_pen="loadings",
                  n.lambda=5,jump=0.01)

mod2 <- '
f =~ x4+x5+x3
#x1 ~ x7 + x8 + x9 + x2
x1 ~ f
x2 ~ f
'
outt2 = cfa(mod2, HS)
extractMatrices(outt2)$A
# if no pars_pen specification, defaults to all
# regressions
cv.out = cv_regsem(outt2,type="lasso",
                  n.lambda=15,jump=0.03)
# check
cv.out$pars_pen
# }

Run the code above in your browser using DataLab