ebma tunes EBMA and generates weights for classifier averaging.
ebma(
ebma.fold,
y,
L1.x,
L2.x,
L2.unit,
L2.reg,
pc.names,
post.strat,
n.draws,
tol,
best.subset.opt,
pca.opt,
lasso.opt,
gb.opt,
svm.opt,
deep.mrp,
verbose,
cores
)New data for EBMA tuning. A list containing the the data that must not have been used in classifier training.
Outcome variable. A character vector containing the column names of
the outcome variable. A character scalar containing the column name of
the outcome variable in survey.
Individual-level covariates. A character vector containing the
column names of the individual-level variables in survey and
census used to predict outcome y. Note that geographic unit
is specified in argument L2.unit.
Context-level covariates. A character vector containing the
column names of the context-level variables in survey and
census used to predict outcome y. To exclude context-level
variables, set L2.x = NULL.
Geographic unit. A character scalar containing the column
name of the geographic unit in survey and census at which
outcomes should be aggregated.
Geographic region. A character scalar containing the column
name of the geographic region in survey and census by which
geographic units are grouped (L2.unit must be nested within
L2.reg). Default is NULL.
Principal Component Variable names. A character vector containing the names of the context-level principal components variables.
Post-stratification results. A list containing the best models for each of the tuned classifiers, the individual level predictions on the data classifier trainig data and the post-stratified context-level predictions.
EBMA number of samples. An integer-valued scalar specifying
the number of bootstrapped samples to be drawn from the EBMA fold and used
for tuning EBMA. Default is \(100\). Passed on from ebma.n.draws.
EBMA tolerance. A numeric vector containing the tolerance values
for improvements in the log-likelihood before the EM algorithm stops
optimization. Values should range at least from \(0.01\) to \(0.001\).
Default is c(0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, 0.00001).
Passed on from ebma.tol.
Tuned best subset parameters. A list returned from
run_best_subset().
Tuned best subset with principal components parameters. A list
returned from run_pca().
Tuned lasso parameters. A list returned from
run_lasso().
Tuned gradient tree boosting parameters. A list returned from
run_gb().
Tuned support vector machine parameters. A list returned from
run_svm().
Deep MRP classifier. A logical argument indicating whether
the deep MRP classifier should be used for predicting outcome y.
Default is FALSE.
Verbose output. A logical argument indicating whether or not
verbose output should be printed. Default is FALSE.
The number of cores to be used. An integer indicating the number of processor cores used for parallel computing. Default is 1.