multiPIM
once on the actual data, then sample with replacement from the rows of the data and run multiPIM
again (with the same options) the desired number of times
.multiPIMboot(Y, A, W = NULL, times = 5000, id = 1:nrow(Y), multicore = FALSE, mc.num.jobs, mc.seed = 123, estimator = c("TMLE", "DR-IPCW", "IPCW", "G-COMP"), g.method = "main.terms.logistic", g.sl.cands = NULL, g.num.folds = NULL, g.num.splits = NULL, Q.method = "sl", Q.sl.cands = "default", Q.num.folds = 5, Q.num.splits = 1, Q.type = NULL, adjust.for.other.As = TRUE, truncate = 0.05, return.final.models = TRUE, na.action, verbose = FALSE, extra.cands = NULL, standardize = TRUE, ...)
multiPIM
for the default method of determining, based on the values in Y
, which regression types to allow for modelling Q. Must have unique names.A
on the variables in Y
. No effect measures will be calculated for these variables. May contain numeric (integer or double), or factor values. Must be left as NULL
if not required. If not NULL, must have unique names.Y
, A
and W
to generate and pass to multiPIM
.id[i]
should be equal to id[j]
. Bootstrapping will be carried out by sampling with replacement from the clusters. Keeping the default value will result in sampling with replacement from the observations (i.e. no clustering).mc.num.jobs = 8
. This must be specified whenever multicore
is true. Automatic detection of the number of cores is no longer available.RNGkind
will be called to set the RNG to "L'Ecuyer-CMRG"
). Will be ignored if multicore
is FALSE
. If mulicore
is FALSE
, one should (depending on the candidates used) be able to get reprodicible results by setting the seed normally (with set.seed
) prior to running multiPIMboot."TMLE"
, for the targeted maximum likelihood estimator. Alternatively, one may specify "DR-IPCW"
, for the Double-Robust Inverse Probability of Censoring-Weighted estimator, or "IPCW"
, for the regular IPCW estimator. If the regular IPCW estimator is selected, all arguments which begin with the letter Q are ignored, since only g (the regression of each exposure on possible confounders) needs to be modeled in this case."main.terms.logistic"
, is meant to be used with the default TMLE estimator. If a different estimator is used, it is recommended to use super learning by specifying "sl"
. In this case, the arguments g.sl.cands
, g.num.folds
and g.num.splits
must also be specified. Other possible values for the g.method
argument are: one of the elements of the vector all.bin.cands
, or, if extra.cands
is supplied, one of the names of the extra.cands
list of functions. Ignored if estimator
is "G-COMP"
.all.bin.cands
, or from the names of the extra.cands
list of functions, if it is supplied. Ignored if estimator
is "G-COMP"
. or if g.method
is not "sl"
. NOTE: The TMLE estimator is recommended, but if one is using either of the IPCW estimators, a reasonable choice is to specify g.method = "sl"
and g.sl.cands = default.bin.cands
.estimator
is "G-COMP"
, or if g.method
is not "sl"
.g.num.folds
folds in cross-validating the super learner fit for g. Cross-validation results will be averaged over all splits. Ignored if estimator
is "G-COMP"
, or if g.method
is not "sl"
."sl"
, indicates that super learning should be used for modelling Q. Ignored if estimator
is "IPCW"
."default"
or "all"
or a character vector of length $>= 2$ containing elements of either all.bin.cands
or of all.cont.cands
, or of the names of the extra.cands
list of functions, if it is supplied. See details. Ignored if estimator
is "IPCW"
or if Q.method
is not "sl"
.estimator
is "IPCW"
or if Q.method
is not "sl"
.Q.num.folds
folds in cross-validating the super learner fit for Q. Ignored if estimator
is "IPCW"
or if Q.method
is not "sl"
.NULL
or a length 1 character vector (which must be either "binary.outcome"
or "continuous.outcome"
). This provides a way to override the default mechanism for deciding which candidates will be allowed for modeling Q (see details). Ignored if estimator
is "IPCW"
.A
should be included (for TRUE
) or not (for FALSE
) in the g and Q models used to calculate the effect of each column of A
on each column of Y
. See details. Ignored if A
has only one column.FALSE
, or a single number greater than 0 and less than 0.5 at which the values of g(0, W) should be truncated in order to avoid instability of the estimator. Ignored if estimator
is "G-COMP"
.g.final.models
and Q.final.models
). Default is TRUE
. If memory is a concern, you will probably want to set this to FALSE. Note that only g and Q models for the main multiPIM run will be returned, not for each of the bootstrap runs.Y
, A
or (a non-null) W
has missing values, multiPIMboot
will throw an error.verbose
is set to FALSE
.multiPIM
."multiPIM"
which is identical to the object resulting from running the multiPIM
function in the original data, except for two slots which are slightly different: the call
slot contains a copy of the original call to multiPIMboot
, and the boot.param.array
slot now contains the bootstrap distribution of the parameter estimates gotten by running multiPIM
on the bootstrap replicates of the original data. Thus the object returned has the following slots:
ncol(A)
by ncol(Y)
with rownames
equal to names(A)
and colnames
equal to names(Y)
, with each element being the estimated causal attributable risk for the exposure given by its row name vs. the outcome given by its column name.param.estimates
containing the corresponding plug-in standard errors of the parameter estimates. These are obtained from the influence curve. Note: plug-in standard errors are not available for estimator = "G-COMP"
. This field will be set to NA
in this case.multiPIMboot
which generated this object.ncol(A)
.ncol(Y)
.W
data frame, if one was supplied. If no W
was supplied, this will be NA
.NA
if g.method
was not "sl"
.ncol(A)
elements. The ith element will be the name of the candidate which "won" the cross validation in the g model for the ith column of A
.c(ncol(A), g.num.splits, length(g.sl.cands))
containing cross-validated risks from super learner modeling for g for each exposure-split-candidate triple. Has informative dimnames attribute. Note: the values are technically not risks, but log likelihoods (i.e. winning candidate is the one for which this is a max, not a min).nrow(A)
containing the objects returned by the candidate functions used in the final g models (see Candidates).NA
if g.method
was not "sl"
.NA
if g.method
was not "sl"
.NA
if double.robust
was FALSE
.NA
if double.robust
was FALSE
or if Q.method
was not "sl"
.ncol(Y)
elements. The ith element is the name of the candidate which "won" the cross validation in the super learner for the Q model for the ith column of Y
.c(ncol(A), ncol(Y), Q.num.splits, length(Q.sl.cands))
containing cross-validated risks from super learner modeling for Q. Has informative dimnames attribute. Note: the values will be log likelihoods when Q.type
is "binary.outcome"
(see note above for g.cv.risk.array
), and they will be mean squared errors when Q.type
is "continuous.outcome"
.ncol(A)
, each element of which is another list of length ncol(Y)
containing the objects returned by the candidate functions used for the Q models. I.e. Q.final.models[[i]][[j]]
contains the Q model information for exposure i and outcome j.NA
if double.robust
was FALSE
or if Q.method
was not "sl"
.NA
if double.robust
was FALSE
or if Q.method
was not "sl"
."continuous.outcome"
or "binary.outcome"
, depending on the contents of Y
or on the value of the Q.type
argument, if supplied.A
were included in models used to calculate the effect of each column of A
on each column of Y
. Will be set to NA
when A
has only one column.truncate
argument. Will be set to NA if estimator was "G-COMP"
.FALSE
when truncate
is FALSE
. Will be set to NA if estimator was "G-COMP"
.standardize
argument.dim
attribute equal to c(times, ncol(A), ncol(Y))
containing the corresponding parameter estimate for each bootstrap replicatate-exposure-outcome trio. Also has an informative dimnames
attribute for easy printing.summary
function on the multiPIMboot
result (see link{summary.multiPIM}
).As of multiPIM version 1.3-1, support for multicore processing is through R's parallel package (distributed with R as of version 2.14.0).
For more details on how to use the arguments, see the details section for multiPIM
.
Hubbard, Alan E. and van der Laan, Mark J. (2008) Population Intervention Models in Causal Inference. Biometrika 95, 1: 35--47.
Young, Jessica G., Hubbard, Alan E., Eskenazi, Brenda, and Jewell, Nicholas P. (2009) A Machine-Learning Algorithm for Estimating and Ranking the Impact of Environmental Risk Factors in Exploratory Epidemiological Studies. U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 250. http://www.bepress.com/ucbbiostat/paper250
van der Laan, Mark J. and Rose, Sherri (2011) Targeted Learning, Springer, New York. ISBN: 978-1441997814
Sinisi, Sandra E., Polley, Eric C., Petersen, Maya L, Rhee, Soo-Yon and van der Laan, Mark J. (2007) Super learning: An Application to the Prediction of HIV-1 Drug Resistance. Statistical Applications in Genetics and Molecular Biology 6, 1: article 7. http://www.bepress.com/sagmb/vol6/iss1/art7
van der Laan, Mark J., Polley, Eric C. and Hubbard, Alan E. (2007) Super learner. Statistical applications in genetics and molecular biology 6, 1: article 25. http://www.bepress.com/sagmb/vol6/iss1/art25
multiPIM
for the main function which is called by multiPIMboot
.summary.multiPIM
for printing summaries of the results.
Candidates
to see which candidates are currently available, and for information on writing user-defined super learner candidates and regression methods.
## Warning: This would take a very long time to run!
## Not run:
# ## load example from multiPIM help file
#
# example(multiPIM)
#
# ## this would run 5000 bootstrap replicates:
#
# boot.result <- multiPIMboot(Y, A)
#
# summary(boot.result)## End(Not run)
Run the code above in your browser using DataLab