phmmPen: Fit Penalized Proportional Hazards Mixed Models via Monte Carlo Expectation Conditional Minimization (MCECM) using a Piecewise Constant Hazard Mixed Model Approximation

Description

phmmPen_FA is used to fit penalized proportional hazards mixed models using a piecewise constant hazard mixed model approximation via Monte Carlo Expectation Conditional Minimization (MCECM). The purpose of the function is to perform variable selection on both the fixed and random effects simultaneously for the piecewise constant hazard mixed model. phmmPen selects the best model using BIC-type selection criteria (see selectControl documentation for further details). To improve the speed of the algorithm, consider setting var_restrictions = "fixef" within the optimControl options.

Usage

phmmPen(
  formula,
  data = NULL,
  covar = NULL,
  offset = NULL,
  fixef_noPen = NULL,
  penalty = c("MCP", "SCAD", "lasso"),
  alpha = 1,
  gamma_penalty = switch(penalty[1], SCAD = 4, 3),
  optim_options = optimControl(),
  adapt_RW_options = adaptControl(),
  trace = 0,
  tuning_options = selectControl(),
  survival_options = survivalControl(),
  BICq_posterior = NULL,
  progress = TRUE
)

Value

A reference class object of class pglmmObj for which many methods are available (e.g. methods(class = "pglmmObj"), see ?pglmmObj for additional documentation)

Arguments

formula: a two-sided linear formula object describing both the fixed effects and random effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. The response must be a Surv object (see Surv from the survival package). Random-effects terms are distinguished by vertical bars ("|") separating expression for design matrices from the grouping factor. formula should be of the same format needed for glmer in package lme4. Only one grouping factor will be recognized. The random effects covariates need to be a subset of the fixed effects covariates. The offset must be specified outside of the formula in the 'offset' argument.
data: an optional data frame containing the variables named in formula. If data is omitted, variables will be taken from the environment of formula.
covar: character string specifying whether the covariance matrix should be unstructured ("unstructured") or diagonal with no covariances between variables ("independent"). Default is set to NULL. If covar is set to NULL and the number of random effects predictors (not including the intercept) is greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an independent covariance structure and covar is set to "independent". Otherwise if covar is set to NULL and the number of random effects predictors is less than 10, then the algorithm automatically assumes an unstructured covariance structure and covar is set to "unstructured".
offset: This can be used to specify an a priori known component to be included in the linear predictor during fitting. Default set to NULL (no offset). If the data argument is not NULL, this should be a numeric vector of length equal to the number of cases (the length of the response vector). If the data argument specifies a data.frame, the offset argument should specify the name of a column in the data.frame.
fixef_noPen: Optional vector of 0's and 1's of the same length as the number of fixed effects covariates used in the model. Value 0 indicates the variable should not have its fixed effect coefficient penalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.
penalty: character describing the type of penalty to use in the variable selection procedure. Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariance matrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.
alpha: Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty. alpha=1 is equivalent to the MCP/SCAD/LASSO penalty, while alpha=0 is equivalent to ridge regression. However, alpha=0 is not supported; alpha may be arbitrarily small, but not exactly zero
gamma_penalty: The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty. Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.
optim_options: a structure of class "optimControl" created from function optimControl that specifies several optimization parameters. See the documentation for optimControl for more details on defaults.
adapt_RW_options: a list of class "adaptControl" from function adaptControl containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure. Ignored if optimControl parameter sampler is set to "stan" (default) or "independence".
trace: an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.
tuning_options: a list of class "selectControl" or "lambdaControl" resulting from selectControl or lambdaControl containing additional control parameters. When function glmm is used,the algorithm may be run using one specific set of penalty parameters lambda0 and lambda1 by specifying such values in lambdaControl(). The default for glmm is to run the model fit with no penalization (lambda0 = lambda1 = 0). When function glmmPen is run, tuning_options is specified using selectControl(). See the lambdaControl and selectControl documentation for further details.
survival_options: a structure of class "survivalControl" created from function survivalControl that specifies several parameters needed to properly fit the input survival data using a piecewise constant hazard mixed model. See the documentation for survivalControl for more details on defaults.
BICq_posterior: an optional character string expressing the path and file basename of a file combination that will file-back or currently file-backs a big.matrix of the posterior samples from the minimal penalty model used for the BIC-ICQ calculation used for model selection. T (BIC-ICQ reference: Ibrahim et al (2011) <doi:10.1111/j.1541-0420.2010.01463.x>). If this argument is specified as NULL (default) and BIC-ICQ calculations are requested (see selectControl) for details), the posterior samples will be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc' in the working directory. See 'Details' section for additional details about the required format of BICq_posterior and the file-backed big matrix.
progress: a logical value indicating if additional output should be given showing the progress of the fit procedure. If TRUE, such output includes iteration-level information for the fit procedure (iteration number EM_iter, number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficients from t--defined in optimControl--iterations back EM_conv, and number of non-zero fixed and random effects covariates not including the intercept). Additionally, progress = TRUE gives some other information regarding the progress of the variable selection procedure, including the model selection criteria and log-likelihood estimates for each model fit. Default is TRUE.

Details

Argument BICq_posterior details: If the BIC_option in selectControl (tuning_options) is specified to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selection process. For the BIC-ICQ criterion to be calculated, a full model assuming a small valued lambda penalty needs to be fit, and the posterior draws from this full model need to be used. In order to avoid repetitive calculations of this full model (i.e. if the user wants to re-run phmmPen with a different set of penalty parameters), a big.matrix of these posterior draws will be file-backed as two files: a backing file with extention '.bin' and a descriptor file with extension '.desc'. The BICq_posterior argument should contain a path and a filename with no extension of the form "./path/filename" such that the backingfile and the descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively. If BICq_posterior is set to NULL, then by default, the backingfile and descriptor file are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc". If the big matrix of posterior draws is already file-backed, BICq_posterior should specify the path and basename of the appropriate files (again of form "./path/filename"); the full model will not be fit again and the big.matrix of posterior draws will be read using the attach.big.matrix function of the bigmemory package and used in the BIC-ICQ calcuations. If the appropriate files do not exist or BICq_posterior is specified as NULL, the full model will be fit and the full model posterior draws will be saved as specified above. The algorithm will save 10^4 posterior draws automatically.

Trace details: The value of 0 (default) does not output any extra information. The value of 1 additionally outputs the updated coefficients, updated covariance matrix values, and the number of coordinate descent iterations used for the M step for each EM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion is requested, trace = 1 gives additional information about the penalties used for the 'full model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate information for the adaptive random walk and independence samplers and the updated proposal standard deviation for the adaptive random walk.