bootstrap_inference: Bootstrap inference for prespecified models

Description

Runs B bootstrap samples using a prespecified model then computes the two I estimates based on cross validation. p values of the two I estimates are computed for a given \(H_0: \mu_{I_0} = \mu_0\) and confidence intervals are provided.

Usage

bootstrap_inference(X, y, 
		model_string,
		predict_string = "predict(mod, obs_left_out)",
		cleanup_mod_function = NA,
		y_higher_is_better = TRUE,
		verbose = TRUE,
		full_verbose = FALSE,
		H_0_mu_equals = 0,
		pct_leave_out = 0.10,
		B = 3000,
		alpha = 0.05,
		plot = TRUE,
        num_cores = 1,
        ...)

Arguments

A \(n \times p\) dataframe of covariates.

An \(n\)-length numeric vector which is the response

model_string

A string of R code that will be evaluated to construct the leave one out model. Make sure the covariate data is referred to as Xyleft.

predict_string

A string of R code that will be evaluated on left out data after the model is built with the training data. Make sure the forecast data (the left one out data) is referred to as obs_left_out and the model is referred to as mod.

cleanup_mod_function

A function that is called at the end of a cross validation iteration to cleanup the model in some way.

y_higher_is_better

True if a response value being higher is clinically "better" than one that is lower (e.g. cognitive ability in a drug trial for the mentally ill). False if the response value being lower is clinically "better" than one that is higher (e.g. amount of weight lost in a weight-loss trial). Default is TRUE.

verbose

Prints out a dot for each bootstrap sample. This only works on some platforms.

full_verbose

Prints out full information for each cross validation model for each bootstrap sample. This only works on some platforms.

H_0_mu_equals

The \(\mu_{I_0}\) value in \(H_0\). Default is 0 which answers the question: does my allocation procedure do better than a naive allocation procedure.

pct_leave_out

In the cross-validation, the proportion of the original dataset left out to estimate out-of-sample metrics. The default is 0.1 which corresponds to 10-fold cross validation.

The number of bootstrap samples to take. We recommend making this as high as you can tolerate given speed considerations. The default is 3000.

alpha

Defines the confidence interval size (1 - alpha). Defaults to 0.05.

plot

Illustrates the estimate, the bootstrap samples and the confidence intervals on a histogram plot. Default to TRUE.

num_cores

The number of cores to use in parallel to run the bootstrap samples more rapidly. Defaults to serial by using 1 core.

...

Additional parameters to be sent to the model constructor. Note that if you wish to pass these parameters, "..." must be specified in model_string.

Value

Returns a list object containing results of the procedure.

References

Kapelner, A, Bleich, J, Cohen, ZD, DeRubeis, RJ and Berk, R (2014) Inference for Treatment Regime Models in Personalized Medicine, arXiv

Examples

Run this code

	beta0 = 1
	beta1 = -1
	gamma0 = 0
	gamma1 = sqrt(2 * pi)
	mu_x = 0
	sigsq_x = 1
	sigsq_e = 1
	num_boot = 20 #for speed only
	n = 50 #for speed only
	
	x = sort(rnorm(n, mu_x, sigsq_x))
	noise = rnorm(n, 0, sigsq_e)
	
	treatment = sample(c(rep(1, n / 2), rep(0, n / 2)))
	y = beta0 + beta1 * x + treatment * (gamma0 + gamma1 * x) + noise
	
	X = data.frame(treatment, x)
	
	res = bootstrap_inference(X, y,
			"lm(y ~ . + treatment * ., data = Xyleft)",
			num_cores = 1,
			B = num_boot)

Run the code above in your browser using DataLab