phmmPen_FA
is used to fit penalized proportional hazards mixed models
using a piecewise constant hazard mixed model approximation via
Monte Carlo Expectation Conditional Minimization (MCECM).
The purpose of the function is to perform
variable selection on both the fixed and random effects simultaneously for the
piecewise constant hazard mixed model.
phmmPen
selects the best model using
BIC-type selection criteria (see selectControl
documentation for
further details).
To improve the speed of the algorithm, consider setting
var_restrictions
= "fixef" within the optimControl
options.
phmmPen(
formula,
data = NULL,
covar = NULL,
offset = NULL,
fixef_noPen = NULL,
penalty = c("MCP", "SCAD", "lasso"),
alpha = 1,
gamma_penalty = switch(penalty[1], SCAD = 4, 3),
optim_options = optimControl(),
adapt_RW_options = adaptControl(),
trace = 0,
tuning_options = selectControl(),
survival_options = survivalControl(),
BICq_posterior = NULL,
progress = TRUE
)
A reference class object of class pglmmObj
for which many methods are
available (e.g. methods(class = "pglmmObj")
, see ?pglmmObj for additional documentation)
a two-sided linear formula object describing both the fixed effects and
random effects part of the model, with the response on the left of a ~ operator and the terms,
separated by + operators, on the right.
The response must be a Surv
object (see
Surv
from the survival
package).
Random-effects terms are distinguished by vertical bars
("|") separating expression for design matrices from the grouping factor. formula
should be
of the same format needed for glmer
in package lme4.
Only one grouping factor
will be recognized. The random effects covariates need to be a subset of the fixed effects covariates.
The offset must be specified outside of the formula in the 'offset' argument.
an optional data frame containing the variables named in formula
. If data
is
omitted, variables will be taken from the environment of formula
.
character string specifying whether the covariance matrix should be unstructured
("unstructured") or diagonal with no covariances between variables ("independent").
Default is set to NULL
. If covar
is set to NULL
and the number of random effects
predictors (not including the intercept) is
greater than or equal to 10 (i.e. high dimensional), then the algorithm automatically assumes an
independent covariance structure and covar
is set to "independent". Otherwise if covar
is set to NULL
and the number of random effects predictors is less than 10, then the
algorithm automatically assumes an unstructured covariance structure and covar
is set to "unstructured".
This can be used to specify an a priori known component to be included in the
linear predictor during fitting. Default set to NULL
(no offset). If the data
argument is not NULL
, this should be a numeric vector of length equal to the
number of cases (the length of the response vector).
If the data argument specifies a data.frame, the offset
argument should specify the name of a column in the data.frame.
Optional vector of 0's and 1's of the same length as the number of fixed effects covariates used in the model. Value 0 indicates the variable should not have its fixed effect coefficient penalized, 1 indicates that it can be penalized. Order should correspond to the same order of the fixed effects given in the formula.
character describing the type of penalty to use in the variable selection procedure. Options include 'MCP', 'SCAD', and 'lasso'. Default is MCP penalty. If the random effect covariance matrix is "unstructured", then a group MCP, group SCAD, or group LASSO penalty is used on the random effects coefficients. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for details of these penalties.
Tuning parameter for the Mnet estimator which controls the relative contributions
from the MCP/SCAD/LASSO penalty and the ridge, or L2, penalty. alpha=1
is equivalent to
the MCP/SCAD/LASSO penalty, while alpha=0
is equivalent to ridge regression. However,
alpha=0
is not supported; alpha
may be arbitrarily small, but not exactly zero
The scaling factor of the MCP and SCAD penalties. Not used by LASSO penalty. Default is 4.0 for SCAD and 3.0 for MCP. See Breheny and Huang (2011) <doi:10.1214/10-AOAS388> and Breheny and Huang (2015) <doi:10.1007/s11222-013-9424-2> for further details.
a structure of class "optimControl" created
from function optimControl
that specifies several optimization parameters. See the
documentation for optimControl
for more details on defaults.
a list of class "adaptControl" from function adaptControl
containing the control parameters for the adaptive random walk Metropolis-within-Gibbs procedure.
Ignored if optimControl
parameter sampler
is set to "stan" (default) or "independence".
an integer specifying print output to include as function runs. Default value is 0. See Details for more information about output provided when trace = 0, 1, or 2.
a list of class "selectControl" or "lambdaControl" resulting from
selectControl
or lambdaControl
containing additional control parameters.
When function glmm
is used,the algorithm may be run using one specific set of
penalty parameters lambda0
and lambda1
by specifying such values in lambdaControl()
.
The default for glmm
is to run the model fit with no penalization (lambda0
= lambda1
= 0).
When function glmmPen
is run, tuning_options
is specified using selectControl()
.
See the lambdaControl
and selectControl
documentation for further details.
a structure of class "survivalControl" created
from function survivalControl
that specifies several parameters needed to
properly fit the input survival data using a piecewise constant hazard mixed model. See the
documentation for survivalControl
for more details on defaults.
an optional character string expressing the path and file
basename of a file combination that
will file-back or currently file-backs a big.matrix
of the posterior samples from the
minimal penalty model used for the BIC-ICQ calculation used for model selection. T
(BIC-ICQ reference: Ibrahim et al (2011)
<doi:10.1111/j.1541-0420.2010.01463.x>).
If this argument is
specified as NULL
(default) and BIC-ICQ calculations are requested (see selectControl
)
for details), the posterior samples
will be saved in the file combination 'BICq_Posterior_Draws.bin' and 'BICq_Posterior_Draws.desc'
in the working directory.
See 'Details' section for additional details about the required format of BICq_posterior
and the file-backed big matrix.
a logical value indicating if additional output should be given showing the
progress of the fit procedure. If TRUE
, such output includes iteration-level information
for the fit procedure (iteration number EM_iter,
number of MCMC samples nMC, average Euclidean distance between current coefficients and coefficients
from t--defined in optimControl
--iterations back EM_conv,
and number of non-zero fixed and random effects covariates
not including the intercept). Additionally, progress = TRUE
gives some other information regarding the progress of the variable selection
procedure, including the model selection criteria and log-likelihood estimates
for each model fit.
Default is TRUE
.
Argument BICq_posterior
details: If the BIC_option
in selectControl
(tuning_options
) is specified
to be 'BICq', this requests the calculation of the BIC-ICQ criterion during the selection
process. For the BIC-ICQ criterion to be calculated, a full model assuming a small valued
lambda penalty needs to be fit, and the posterior draws from this full model need to be used.
In order to avoid repetitive calculations of
this full model (i.e. if the user wants to re-run phmmPen
with a different
set of penalty parameters), a big.matrix
of these
posterior draws will be file-backed as two files: a backing file with extention '.bin' and a
descriptor file with extension '.desc'. The BICq_posterior
argument should contain a
path and a filename with no extension of the form "./path/filename" such that the backingfile and
the descriptor file would then be saved as "./path/filename.bin" and "./path/filename.desc", respectively.
If BICq_posterior
is set to NULL
, then by default, the backingfile and descriptor
file are saved in the working directory as "BICq_Posterior_Draws.bin" and "BICq_Posterior_Draws.desc".
If the big matrix of posterior draws is already file-backed, BICq_posterior
should
specify the path and basename of the appropriate files (again of form "./path/filename");
the full model
will not be fit again and the big.matrix of
posterior draws will be read using the attach.big.matrix
function of the
bigmemory
package and used in the BIC-ICQ
calcuations. If the appropriate files do not exist or BICq_posterior
is specified as NULL
, the full model will be fit and the full model posterior
draws will be saved as specified above. The algorithm will save 10^4 posterior draws automatically.
Trace details: The value of 0 (default) does not output any extra information. The value of 1 additionally outputs the updated coefficients, updated covariance matrix values, and the number of coordinate descent iterations used for the M step for each EM iteration. When pre-screening procedure is used and/or if the BIC-ICQ criterion is requested, trace = 1 gives additional information about the penalties used for the 'full model' fit procedure. If Stan is not used as the E-step sampling mechanism, the value of 2 outputs all of the above plus gibbs acceptance rate information for the adaptive random walk and independence samplers and the updated proposal standard deviation for the adaptive random walk.