- obs_data
Data table containing the observed data.
- id
Character string specifying the name of the ID variable in obs_data
.
- time_points
Number of time points to simulate. By default, this argument is set equal to the maximum
number of records that obs_data
contains for any individual.
- time_name
Character string specifying the name of the time variable in obs_data
.
- covnames
Vector of character strings specifying the names of the time-varying covariates in obs_data
.
- covtypes
Vector of character strings specifying the "type" of each time-varying covariate included in covnames
. The possible "types" are: "binary"
, "normal"
, "categorical"
, "bounded normal"
, "zero-inflated normal"
, "truncated normal"
, "absorbing"
, "categorical time"
, and "custom"
.
- covparams
List of vectors, where each vector contains information for
one parameter used in the modeling of the time-varying covariates (e.g.,
model statement, family, link function, etc.). Each vector
must be the same length as covnames
and in the same order.
If a parameter is not required for a certain covariate, it
should be set to NA
at that index.
- covfits_custom
Vector containing custom fit functions for time-varying covariates that
do not fall within the pre-defined covariate types. It should be in
the same order covnames
. If a custom fit function is not
required for a particular covariate (e.g., if the first
covariate is of type "binary"
but the second is of type "custom"
), then that
index should be set to NA
. The default is NA
.
- covpredict_custom
Vector containing custom prediction functions for time-varying
covariates that do not fall within the pre-defined covariate types.
It should be in the same order as covnames
. If a custom
prediction function is not required for a particular
covariate, then that index should be set to NA
. The default is NA
.
- histvars
List of vectors. The kth vector specifies the names of the variables for which the kth history function
in histories
is to be applied.
- histories
Vector of history functions to apply to the variables specified in histvars
. The default is NA
.
- basecovs
Vector of character strings specifying the names of baseline covariates in obs_data
. These covariates are not simulated using a model but rather carry their value over all time points from the first time point of obs_data
. These covariates should not be included in covnames
. The default is NA
.
- outcome_name
Character string specifying the name of the outcome variable in obs_data
.
- ymodel
Model statement for the outcome variable.
- ymodel_fit_custom
Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details. The default is NULL
.
- ymodel_predict_custom
Function obtaining predictions from the custom outcome model specified in ymodel_fit_custom
. See the vignette "Using Custom Outcome Models in gfoRmula" for details. The default is NULL
.
- compevent_name
Character string specifying the name of the competing event variable in obs_data
.
- compevent_model
Model statement for the competing event variable. The default is NA
.
- compevent_cens
Logical scalar indicating whether to treat competing events as censoring events.
This argument is only applicable for survival outcomes and when a competing even model is supplied (i.e., compevent_name
and compevent_model
are specified).
If this argument is set to TRUE
, the competing event model will only be used to construct inverse probability weights to estimate the natural course means / risk from the observed data.
If this argument is set to FALSE
, the competing event model will be used in the parametric g-formula estimates of the risk and will not be used to construct inverse probability weights.
See "Details". The default is FALSE
.
- censor_name
Character string specifying the name of the censoring variable in obs_data
. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data. See "Details".
- censor_model
Model statement for the censoring variable. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data. See "Details".
- intvars
(Deprecated. See the ...
argument) List, whose elements are vectors of character strings. The kth vector in intvars
specifies the name(s) of the variable(s) to be intervened
on in each round of the simulation under the kth intervention in interventions
.
- interventions
(Deprecated. See the ...
argument) List, whose elements are lists of vectors. Each list in interventions
specifies a unique intervention on the relevant variable(s) in intvars
. Each vector contains a function
implementing a particular intervention on a single variable, optionally
followed by one or more "intervention values" (i.e.,
integers used to specify the treatment regime).
- int_times
(Deprecated. See the ...
argument) List, whose elements are lists of vectors. The kth list in int_times
corresponds to the kth intervention in interventions
. Each vector specifies the time points in which the relevant intervention is applied on the corresponding variable in intvars
.
When an intervention is not applied, the simulated natural course value is used. By default, this argument is set so that all interventions are applied in all time points.
- int_descript
Vector of character strings, each describing an intervention. It must
be in same order as the specified interventions (see the ...
argument).
- ref_int
Integer denoting the intervention to be used as the
reference for calculating the risk ratio and risk difference. 0 denotes the
natural course, while subsequent integers denote user-specified
interventions in the order that they are
named in interventions
. The default is 0.
- intcomp
List of two numbers indicating a pair of interventions to be compared by a hazard ratio.
The default is NA
, resulting in no hazard ratio calculation.
- visitprocess
List of vectors. Each vector contains as its first entry
the covariate name of a visit process; its second entry
the name of a covariate whose modeling depends on the
visit process; and its third entry the maximum number
of consecutive visits that can be missed before an
individual is censored. The default is NA
.
- restrictions
List of vectors. Each vector contains as its first entry a covariate for which
a priori knowledge of its distribution is available; its second entry a condition
under which no knowledge of its distribution is available and that must be TRUE
for the distribution of that covariate given that condition to be estimated via a parametric
model or other fitting procedure; its third entry a function for estimating the distribution
of that covariate given the condition in the second entry is false such that a priori knowledge
of the covariate distribution is available; and its fourth entry a value used by the function in the
third entry. The default is NA
.
- yrestrictions
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is TRUE
, the outcome variable is simulated
according to the fitted model; when the condition is FALSE
,
the outcome variable takes on the value in the second entry.
The default is NA
.
- compevent_restrictions
List of vectors. Each vector contains as its first entry
a condition and its second entry an integer. When the
condition is TRUE
, the competing event variable is simulated
according to the fitted model; when the condition is FALSE
,
the competing event variable takes on the value in the
second entry. The default is NA
.
- baselags
Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not
included in obs_data
and when the current time index, \(t\), is such that \(t < i\). If this argument is set to FALSE
, the value
of all lagi and lag_cumavgi terms in this context are set to 0 (for non-categorical covariates) or the reference
level (for categorical covariates). If this argument is set to TRUE
, the value of lagi and lag_cumavgi terms
are set to their values at time 0. The default is FALSE
.
- nsimul
Number of subjects for whom to simulate data. By default, this argument is set
equal to the number of subjects in obs_data
.
- sim_data_b
Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., nsamples
is set to a value greater than 0), this argument must be set to FALSE
. The default is FALSE
.
- seed
Starting seed for simulations and bootstrapping.
- nsamples
Integer specifying the number of bootstrap samples to generate.
The default is 0.
- parallel
Logical scalar indicating whether to parallelize simulations of
different interventions to multiple cores.
- ncores
Integer specifying the number of CPU cores to use in parallel
simulation. This argument is required when parallel is set to TRUE
.
In many applications, users may wish to set this argument equal to parallel::detectCores() - 1
.
- ci_method
Character string specifying the method for calculating the bootstrap 95% confidence intervals, if applicable. The options are "percentile"
and "normal"
.
- threads
Integer specifying the number of threads to be used in data.table
. See setDTthreads
for further details.
- model_fits
Logical scalar indicating whether to return the fitted models. Note that if this argument is set to TRUE
, the output of this function may use a lot of memory. The default is FALSE
.
- boot_diag
Logical scalar indicating whether to return the parametric g-formula estimates as well as the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is FALSE
.
- show_progress
Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when parallel
is set to FALSE
and bootstrap samples are used (i.e., nsamples
is set to a value greater than 0). The default is TRUE
.
- ipw_cutoff_quantile
Percentile by which to truncate inverse probability weights. The default is NULL
(i.e., no truncation). See "Details".
- ipw_cutoff_value
Cutoff value by which to truncate inverse probability weights. The default is NULL
(i.e., no truncation). See "Details".
- int_visit_type
Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in interventions
.
When the kth element is set to FALSE
, the natural value of the treatment variable(s) in the kth intervention in interventions
will be carried forward.
By default, this argument is set so that the intervened value of the treatment variable(s) is carried forward for all interventions.
- sim_trunc
Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type "normal"
, "bounded normal"
, "truncated normal"
, and "zero-inflated normal"
. The default is TRUE
.
- ...
Other arguments, including (a) those that specify the interventions and (b) those that are passed to the functions in covpredict_custom
. To specify interventions, users can supply arguments with the following naming requirements
Each intervention argument begins with a prefix of intervention
.
After the prefix, the intervention number is specified and followed by a period.
After the period, the treatment variable name is specified.
Each intervention argument takes as input a list with the following elements:
The first element specifies the intervention function.
The subsequent elements specify any intervention values.
(Optional) The named element int_times
specifies the time points to apply the intervention. By default, all interventions are applied at all time points.
For example, an "always treat" intervention on A
is given by
intervention1.A = list(static, rep(1, time_points))
See the vignette "A Simplified Approach for Specifying Interventions in gfoRmula" and "Examples" section for more examples.