barps: Bayesian Additive Regression Trees with Post-Stratification (BARP)

Description

This function uses Bayesian Additive Regression Trees (BART) to extrapolate survey data to a level of geographic aggregation at which the original survey was not sampled to be representative of. This is a modified version of the barp function from the BARP to allow for seed fixation.(https://github.com/jbisbee1/BARP)

Usage

barps(
  y,
  x,
  dat,
  census,
  geo.unit,
  algorithm = "BARP",
  setSeed = NULL,
  proportion = "None",
  cred_int = c(0.025, 0.975),
  BSSD = FALSE,
  nsims = 200,
  ...
)

Value

Returns an object of class BARP, containing a list of the following components:

pred.opn: A data.frame where each row corresponds to the geographic unit of interest and the columns summarize the predicted outcome and the upper and lower bounds for the given credible interval (cred_int).
trees: A bartMachine object.
risk: A data.frame containing the cross-validation risk for each algorithm and the associated weight used in the ensemble predictions. Only useful when multiple algorithms are used.
barp.dat: Data containing the estimates and credible intervals for each observation in the input census dataset.
setSeed: The random seed value employed during model estimation using bartMachine.
proportion: The number of observations in each combination of features.
x: The names of the explanatory variables included in the model.

Arguments

y: Outcome of interest. Should be a character of the column name containing the variable of interest.
x: Prognostic covariates. Should be a vector of column names corresponding to the covariates used to predict the outcome variable of interest.
dat: Survey data containing the x and y column names. The explanatory variables X included in the model must be converted to factors prior to input.
census: Census data containing the x column names. It must also have the same structure as X. If the user provides raw census data, BARP will calculate proportions for each unique bin of x covariates. Otherwise, the researcher must calculate bin proportions and indicate the column name that contains the proportions, either as percentages or as raw counts.
geo.unit: The column name corresponding to the unit at which outcomes should be aggregated.
algorithm: Algorithm for predicting opinions. Can be any algorithm(s) included in the SuperLearner package. If multiple algorithms are listed, predicted opinions are provided for each separately, as well as for the weighted ensemble. Defaults to BARP which implements Bayesian Additive Regression Trees via bartMachine.
setSeed: Seed to control random number generation.
proportion: The column name corresponding to the proportions for covariate bins in the Census data. If left to the default None value, BARP assumes raw census data and estimates bin proportions automatically.
cred_int: A vector giving the lower and upper bounds on the credible interval for the predictions.
BSSD: Calculate bootstrapped standard deviation. Defaults to FALSE in which case the standard deviation is generated by BART's default.
nsims: The number of bootstrap simulations.
...: Additional arguments to be passed to bartMachine or SuperLearner.

Description

Usage

Value

Arguments

See Also