optimControl: Control of Penalized Generalized Linear Mixed Model Fitting

Description

Constructs the control structure for the optimization of the penalized mixed model fit algorithm.

Usage

optimControl(
  var_restrictions = c("none", "fixef"),
  conv_EM = 0.0015,
  conv_CD = 5e-04,
  nMC_burnin = NULL,
  nMC_start = NULL,
  nMC_max = NULL,
  nMC_report = 5000,
  maxitEM = NULL,
  maxit_CD = 50,
  M = 10000,
  t = 2,
  mcc = 2,
  sampler = c("stan", "random_walk", "independence"),
  var_start = "recommend",
  step_size = 1,
  standardization = TRUE,
  convEM_type = c("AvgEuclid1", "maxdiff", "AvgEuclid2", "Qfun"),
  B_init_type = c("deterministic", "data", "random")
)

Value

Function returns a list inheriting from class optimControl

containing fit and optimization criteria values used in optimization routine.

Arguments

var_restrictions: character string indicating how the random effect covariance matrix should be initialized at the beginning of the algorithm when penalties are applied to the coefficients. If "none" (default), all random effect predictors are initialized to have non-zero variances. If "fixef", the code first examines the initialized fixed effects (initialized using a regular penalized GLM), and only the random effect predictors that are initialized with non-zero fixed effects are initialized with non-zero variances.
conv_EM: a non-negative numeric convergence criteria for the convergence of the EM algorithm. Default is 0.0015. EM algorithm is considered to have converge if the average Euclidean distance between the current coefficient estimates and the coefficient estimates from t EM iterations back is less than conv_EM mcc times in a row. See t and mcc for more details.
conv_CD: a non-negative numeric convergence criteria for the convergence of the grouped coordinate descent loop within the M step of the EM algorithm. Default 0.0005.
nMC_burnin: positive integer specifying the number of posterior samples to use as burn-in for each E step in the EM algorithm. If set to NULL, the algorithm inputs the following defaults: Default 250 when the number of random effects predictors is less than or equal to 10; default 100 otherwise. Function will not allow nMC_burnin to be less than 100.
nMC_start: a positive integer for the initial number of Monte Carlo draws. If set to NULL, the algorithm inputs the following defaults: Default 250 when the number of random effects predictors is less than or equal to 10; default 100 otherwise.
nMC_max: a positive integer for the maximum number of allowed Monte Carlo draws used in each step of the EM algorithm. If set to NULL, the algorithm inputs the following defaults: When the number of random effect covariates is greater than 10, the default is set to 1000; when the number of random effect covariates is 10 or less, the default is set to 2500.
nMC_report: a positive integer for the number of posterior samples to save from the final model. These posterior samples can be used for diagnostic purposes, see plot_mcmc. Default set to 5000.
maxitEM: a positive integer for the maximum number of allowed EM iterations. If set to NULL, then the algorithm inputs the following defaults: Default equals 50 for the Binomial and Poisson families, 65 for the Gaussian family.
maxit_CD: a positive integer for the maximum number of allowed iterations for the coordinate descent algorithms used within the M-step of each EM iteration. Default equals 50.
M: positive integer specifying the number of posterior samples to use within the Pajor log-likelihood calculation. Default is 10^4; minimum allowed value is 5000.
t: the convergence criteria is based on the average Euclidean distance between the most recent coefficient estimates and the coefficient estimates from t EM iterations back. Positive integer, default equals 2.
mcc: the number of times the convergence criteria must be met before the algorithm is seen as having converged (mcc for 'meet condition counter'). Default set to 2. Value restricted to be no less than 2.
sampler: character string specifying whether the posterior samples of the random effects should be drawn using Stan (default, from package rstan) or the Metropolis-within-Gibbs procedure incorporating an adaptive random walk sampler ("random_walk") or an independence sampler ("independence"). If using the random walk sampler, see adaptControl for some additional control structure parameters.
var_start: either the character string "recommend" or a positive number specifying the starting values to initialize the variance of the covariance matrix. For glmmPen, the default "recommend" first fits a simple model with a fixed and random intercept only using the lme4 R package, see glmer for details on fitting generalized linear mixed models or lmer for details on fitting linear mixed models. The random intercept variance estimate from this model is then multiplied by 2 and used as the starting variance. For glmmPen_FA, the default is set to 0.10 (see B_init_type for further information).
step_size: positive numeric value indicating the starting step size to use in the Majorization-Minimization scheme of the M-step. Only relevant when the distributional assumption used is not Binomial or Gaussian with canonical links (e.g. Poisson with log link)
standardization: logical value indicating whether covariates should standardized (TRUE, default) or unstandardized (FALSE) before being used within the algorithm. If standardization = TRUE, then the standardized covariates will also be used to create the Z matrix used in the estimation of the random effects.
convEM_type: character string indicating the type of convergence criteria to use within the EM algorithm to determine when a model has converged. The default is "AvgEuclid1", which calculates the average Euclidean distance between the most recent coefficient vector and the coefficient vector t EM iterations back (Euclidean distance divided by the number of non-zero coefficients t EM iterations back). Alternative convergence options include "maxdiff", which determines convergence based on the maximum difference between the coefficient vectors; "AvgEuclid2", which is similar to "AvgEuclid1" except it divides the Euclidean distance by the square-root of the number of non-zero coefficients; and "Qfun", which determines convergence based on the relative difference in the Q-function estimates calculated with the most recent coefficient vector and the coefficient vector t EM iterations back.
B_init_type: character string indicating how the B matrix within the glmmPen_FA method should be initialized. (This argument is not used within the glmmPen function.) The default "deterministic" initializes all non-zero variance and covariance values of the random effect covariance matrix to the value of var_start, such that each non-zero element of the B matrix is sqrt(var_start / r) (where r is the number of latent factors). Option "data" is similar to "deterministic", but the var_start value is the default data-driven variance estimate used in glmmPen (see argument var_start for more details).

Details

Several arguments are set to a default value of NULL. If these arguments are left as NULL by the user, then these values will be filled in with appropriate default values as specified above, which may depend on the number of random effects or the family of the data. If the user specifies particular values for these arguments, no additional modifications to these arguments will be done within the algorithm.