phmmPen_FA is used to fit penalized proportional hazards mixed models
using a piecewise constant hazard mixed model approximation via
Monte Carlo Expectation Conditional Minimization (MCECM).
The purpose of the function is to perform
variable selection on both the fixed and random effects simultaneously for the
piecewise constant hazard mixed model.
phmmPen selects the best model using
BIC-type selection criteria (see selectControl documentation for
further details).
To improve the speed of the algorithm, consider setting
var_restrictions = "fixef" within the optimControl options.
phmmPen(
formula,
data = NULL,
covar = NULL,
offset = NULL,
fixef_noPen = NULL,
penalty = c("MCP", "SCAD", "lasso"),
alpha = 1,
gamma_penalty = switch(penalty[1], SCAD = 4, 3),
optim_options = optimControl(),
adapt_RW_options = adaptControl(),
trace = 0,
tuning_options = selectControl(),
survival_options = survivalControl(),
BICq_posterior = NULL,
progress = TRUE
)A reference class object of class pglmmObj for which many methods are
available (e.g. methods(class = "pglmmObj"), see ?pglmmObj for additional documentation)
a two-sided linear formula object describing both the fixed effects and
random effects part of the model, with the response on the left of a ~ operator and the terms,
separated by + operators, on the right.
The response must be a Surv object (see
Surv from the survival package).
Random-effects terms are distinguished by vertical bars
("|") separating expression for design matrices from the grouping factor. formula should be
of the same format needed for glmer in package lme4.
Only one grouping factor
will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.
The offset must be specified outside of the formula in the 'offset' argument.
an optional data frame containing the variables named in formula. If data is
omitted, variables will be taken from the environment of formula.
character string specifying whether the covariance matrix should be unstructured
("unstructured") or diagonal with no covariances between variables ("independent").
Default is set to NULL. If covar is set to NULL and the number of random effects
predictors (not including the intercept) is
greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an
independent covariance structure and covar is set to "independent". Otherwise if covar
is set to NULL and the number of random effects predictors is less than 10, then the
algorithm automatically assumes an unstructured covariance structure and covar is set to "unstructured".
This can be used to specify an a priori known component to be included in the
linear predictor during fitting. Default set to NULL (no offset). If the data
argument is not NULL, this should be a numeric vector of length equal to the
number of cases (the length of the response vector).
If the data argument specifies a data.frame, the offset
argument should specify the name of a column in the data.frame.
Optional vector of 0's and 1's of the same length as the number of fixed effects covariates used in the model. Value 0 indicates the variable should not have its fixed effect coefficient penalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.
character describing the type of penalty to use in the variable selection procedure. Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariance matrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.
Tuning parameter for the Mnet estimator which controls the relative contributions
from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty. alpha=1 is equivalent to
the MCP/SCAD/LASSO penalty, while alpha=0 is equivalent to ridge regression. However,
alpha=0 is not supported; alpha may be arbitrarily small, but not exactly zero
The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty. Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.
a structure of class "optimControl" created
from function optimControl that specifies several optimization parameters. See the
documentation for optimControl for more details on defaults.
a list of class "adaptControl" from function adaptControl
containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure.
Ignored if optimControl parameter sampler is set to "stan" (default) or "independence".
an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.
a list of class "selectControl" or "lambdaControl" resulting from
selectControl or lambdaControl containing additional control parameters.
When function glmm is used,the algorithm may be run using one specific set of
penalty parameters lambda0 and lambda1 by specifying such values in lambdaControl().
The default for glmm is to run the model fit with no penalization (lambda0 = lambda1 = 0).
When function glmmPen is run, tuning_options is specified using selectControl().
See the lambdaControl and selectControl documentation for further details.
a structure of class "survivalControl" created
from function survivalControl that specifies several parameters needed to
properly fit the input survival data using a piecewise constant hazard mixed model. See the
documentation for survivalControl for more details on defaults.
an optional character string expressing the path and file
basename of a file combination that
will file-back or currently file-backs a big.matrix of the posterior samples from the
minimal penalty model used for the BIC-ICQ calculation used for model selection. T
(BIC-ICQ reference: Ibrahim et al (2011)
<doi:10.1111/j.1541-0420.2010.01463.x>).
If this argument is
specified as NULL (default) and BIC-ICQ calculations are requested (see selectControl)
for details), the posterior samples
will be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc'
in the working directory.
See 'Details' section for additional details about the required format of BICq_posterior
and the file-backed big matrix.
a logical value indicating if additional output should be given showing the
progress of the fit procedure. If TRUE, such output includes iteration-level information
for the fit procedure (iteration number EM_iter,
number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficients
from t--defined in optimControl--iterations back EM_conv,
and number of non-zero fixed and random effects covariates
not including the intercept). Additionally, progress = TRUE
gives some other information regarding the progress of the variable selection
procedure, including the model selection criteria and log-likelihood estimates
for each model fit.
Default is TRUE.
Argument BICq_posterior details: If the BIC_option in selectControl
(tuning_options) is specified
to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selection
process. For the BIC-ICQ criterion to be calculated, a full model assuming a small valued
lambda penalty needs to be fit, and the posterior draws from this full model need to be used.
In order to avoid repetitive calculations of
this full model (i.e. if the user wants to re-run phmmPen with a different
set of penalty parameters), a big.matrix of these
posterior draws will be file-backed as two files: a backing file with extention '.bin' and a
descriptor file with extension '.desc'. The BICq_posterior argument should contain a
path and a filename with no extension of the form "./path/filename" such that the backingfile and
the descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively.
If BICq_posterior is set to NULL, then by default, the backingfile and descriptor
file are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc".
If the big matrix of posterior draws is already file-backed, BICq_posterior should
specify the path and basename of the appropriate files (again of form "./path/filename");
the full model
will not be fit again and the big.matrix of
posterior draws will be read using the attach.big.matrix function of the
bigmemory package and used in the BIC-ICQ
calcuations. If the appropriate files do not exist or BICq_posterior
is specified as NULL, the full model will be fit and the full model posterior
draws will be saved as specified above. The algorithm will save 10^4 posterior draws automatically.
Trace details: The value of 0 (default) does not output any extra information. The value of 1 additionally outputs the updated coefficients, updated covariance matrix values, and the number of coordinate descent iterations used for the M step for each EM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion is requested, trace = 1 gives additional information about the penalties used for the 'full model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate information for the adaptive random walk and independence samplers and the updated proposal standard deviation for the adaptive random walk.