A longdata object allows for efficient storage and recall of longitudinal datasets for use in
bootstrap sampling. The object works by de-constructing the data into lists based upon subject id
thus enabling efficient lookup.
dataThe original dataset passed to the constructor (sorted by id and visit)
varsThe vars object (list of key variables) passed to the constructor
visitsA character vector containing the distinct visit levels
idsA character vector containing the unique ids of each subject in self$data
formulaA formula expressing how the design matrix for the data should be constructed
strataA numeric vector indicating which strata each corresponding value of
self$ids belongs to.
If no stratification variable is defined this will default to 1 for all subjects
(i.e. same group).
This field is only used as part of the self$sample_ids() function to enable
stratified bootstrap
sampling
ice_visit_indexA list indexed by subject storing the index number of the first visit affected by the ICE. If there is no ICE then it is set equal to the number of visits plus 1.
valuesA list indexed by subject storing a numeric vector of the original (unimputed) outcome values
groupA list indexed by subject storing a single character
indicating which imputation group the subject belongs to as defined
by self$data[id, self$ivars$group]
It is used
to determine what reference group should be used when imputing the subjects data.
is_marA list indexed by subject storing logical values indicating
if the subjects outcome values are MAR or not.
This list is defaulted to TRUE for all subjects & outcomes and is then
modified by calls to self$set_strategies().
Note that this does not indicate which values are missing, this variable
is True for outcome values that either occurred before the ICE visit
or are post the ICE visit and have an imputation strategy of MAR
strategiesA list indexed by subject storing a single character
value indicating the imputation
strategy assigned to that subject. This list is defaulted to "MAR"
for all subjects and is then
modified by calls to either self$set_strategies() or self$update_strategies()
strategy_lockA list indexed by subject storing a single
logical value indicating whether a
patients imputation strategy is locked or not. If a strategy is
locked it means that it can't change
from MAR to non-MAR. Strategies can be changed from non-MAR to MAR though
this will trigger a warning.
Strategies are locked if the patient is assigned a MAR strategy and
has non-missing after their ICE date. This list is populated by a call to
self$set_strategies().
indexesA list indexed by subject storing a numeric vector of
indexes which specify which rows in the
original dataset belong to this subject i.e. to recover the full data
for subject "pt3" you can use
self$data[self$indexes[["pt3"]],]. This may seem redundant over filtering
the data directly
however it enables efficient bootstrap sampling of the data i.e.
indexes <- unlist(self$indexes[c("pt3", "pt3")])
self$data[indexes,]
This list is populated during the object initialisation.
is_missingA list indexed by subject storing a logical vector indicating whether the corresponding outcome of a subject is missing. This list is populated during the object initialisation.
is_post_iceA list indexed by subject storing a logical vector
indicating whether the corresponding
outcome of a subject is post the date of their ICE. If no ICE data has
been provided this defaults to False
for all observations. This list is populated by a call to self$set_strategies().
get_data()Returns a data.frame based upon required subject IDs. Replaces missing
values with new ones if provided.
longDataConstructor$get_data(
obj = NULL,
nmar.rm = FALSE,
na.rm = FALSE,
idmap = FALSE
)objEither NULL, a character vector of subjects IDs or a
imputation list object. See details.
nmar.rmLogical value. If TRUE will remove observations that are
not regarded as MAR (as determined from self$is_mar).
na.rmLogical value. If TRUE will remove outcome values that are
missing (as determined from self$is_missing).
idmapLogical value. If TRUE will add an attribute idmap which
contains a mapping from the new subject ids to the old subject ids. See details.
If obj is NULL then the full original dataset is returned.
If obj is a character vector then a new dataset consisting of just those subjects is
returned; if the character vector contains duplicate entries then that subject will be
returned multiple times.
If obj is an imputation_df object (as created by imputation_df()) then the
subject ids specified in the object will be returned and missing values will be filled
in by those specified in the imputation list object. i.e.
obj <- imputation_df(
imputation_single( id = "pt1", values = c(1,2,3)),
imputation_single( id = "pt1", values = c(4,5,6)),
imputation_single( id = "pt3", values = c(7,8))
)
longdata$get_data(obj)
Will return a data.frame consisting of all observations for pt1 twice and all of the
observations for pt3 once. The first set of observations for pt1 will have missing
values filled in with c(1,2,3) and the second set will be filled in by c(4,5,6). The
length of the values must be equal to sum(self$is_missing[[id]]).
If obj is not NULL then all subject IDs will be scrambled in order to ensure that they
are unique
i.e. If the pt2 is requested twice then this process guarantees that each set of observations
be have a unique subject ID number. The idmap attribute (if requested) can be used
to map from the new ids back to the old ids.
A data.frame.
add_subject()This function decomposes a patient data from self$data and populates
all the corresponding lists i.e. self$is_missing, self$values, self$group, etc.
This function is only called upon the objects initialization.
longDataConstructor$add_subject(id)idCharacter subject id that exists within self$data.
validate_ids()Throws an error if any element of ids is not within the source data self$data.
longDataConstructor$validate_ids(ids)idsA character vector of ids.
TRUE
sample_ids()Performs random stratified sampling of patient ids (with replacement) Each patient has an equal weight of being picked within their strata (i.e is not dependent on how many non-missing visits they had).
longDataConstructor$sample_ids()Character vector of ids.
extract_by_id()Returns a list of key information for a given subject. Is a convenience wrapper to save having to manually grab each element.
longDataConstructor$extract_by_id(id)idCharacter subject id that exists within self$data.
update_strategies()Convenience function to run self$set_strategies(dat_ice, update=TRUE) kept for legacy reasons.
longDataConstructor$update_strategies(dat_ice)dat_iceA data.frame containing ICE information see impute() for the format of this dataframe.
set_strategies()Updates the self$strategies, self$is_mar, self$is_post_ice variables based upon the provided ICE
information.
longDataConstructor$set_strategies(dat_ice = NULL, update = FALSE)dat_icea data.frame containing ICE information. See details.
updateLogical, indicates that the ICE data should be used as an update. See details.
See draws() for the specification of dat_ice if update=FALSE.
See impute() for the format of dat_ice if update=TRUE.
If update=TRUE this function ensures that MAR strategies cannot be changed to non-MAR in the presence
of post-ICE observations.
check_has_data_at_each_visit()Ensures that all visits have at least 1 observed "MAR" observation. Throws an error if this criteria is not met. This is to ensure that the initial MMRM can be resolved.
longDataConstructor$check_has_data_at_each_visit()
set_strata()Populates the self$strata variable. If the user has specified stratification variables
The first visit is used to determine the value of those variables. If no stratification variables
have been specified then everyone is defined as being in strata 1.
longDataConstructor$set_strata()
new()Constructor function.
longDataConstructor$new(data, vars)datalongitudinal dataset.
varsan ivars object created by set_vars().
clone()The objects of this class are cloneable with this method.
longDataConstructor$clone(deep = FALSE)deepWhether to make a deep clone.
The object also handles multiple other operations specific to rbmi such as defining whether an
outcome value is MAR / Missing or not as well as tracking which imputation strategy is assigned
to each subject.
It is recognised that this objects functionality is fairly overloaded and is hoped that this can be split out into more area specific objects / functions in the future. Further additions of functionality to this object should be avoided if possible.