ebma: Bayesian Ensemble Model Averaging EBMA

Description

ebma tunes EBMA and generates weights for classifier averaging.

Usage

ebma(
  ebma.fold,
  y,
  L1.x,
  L2.x,
  L2.unit,
  L2.reg,
  pc.names,
  post.strat,
  n.draws,
  tol,
  best.subset.opt,
  pca.opt,
  lasso.opt,
  gb.opt,
  svm.opt,
  deep.mrp,
  verbose,
  cores
)

Arguments

ebma.fold: New data for EBMA tuning. A list containing the the data that must not have been used in classifier training.
y: Outcome variable. A character vector containing the column names of the outcome variable. A character scalar containing the column name of the outcome variable in survey.
L1.x: Individual-level covariates. A character vector containing the column names of the individual-level variables in survey and census used to predict outcome y. Note that geographic unit is specified in argument L2.unit.
L2.x: Context-level covariates. A character vector containing the column names of the context-level variables in survey and census used to predict outcome y. To exclude context-level variables, set L2.x = NULL.
L2.unit: Geographic unit. A character scalar containing the column name of the geographic unit in survey and census at which outcomes should be aggregated.
L2.reg: Geographic region. A character scalar containing the column name of the geographic region in survey and census by which geographic units are grouped (L2.unit must be nested within L2.reg). Default is NULL.
pc.names: Principal Component Variable names. A character vector containing the names of the context-level principal components variables.
post.strat: Post-stratification results. A list containing the best models for each of the tuned classifiers, the individual level predictions on the data classifier trainig data and the post-stratified context-level predictions.
n.draws: EBMA number of samples. An integer-valued scalar specifying the number of bootstrapped samples to be drawn from the EBMA fold and used for tuning EBMA. Default is \(100\). Passed on from ebma.n.draws.
tol: EBMA tolerance. A numeric vector containing the tolerance values for improvements in the log-likelihood before the EM algorithm stops optimization. Values should range at least from \(0.01\) to \(0.001\). Default is c(0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, 0.00001). Passed on from ebma.tol.
best.subset.opt: Tuned best subset parameters. A list returned from run_best_subset().
pca.opt: Tuned best subset with principal components parameters. A list returned from run_pca().
lasso.opt: Tuned lasso parameters. A list returned from run_lasso().
gb.opt: Tuned gradient tree boosting parameters. A list returned from run_gb().
svm.opt: Tuned support vector machine parameters. A list returned from run_svm().
deep.mrp: Deep MRP classifier. A logical argument indicating whether the deep MRP classifier should be used for predicting outcome y. Default is FALSE.
verbose: Verbose output. A logical argument indicating whether or not verbose output should be printed. Default is FALSE.
cores: The number of cores to be used. An integer indicating the number of processor cores used for parallel computing. Default is 1.