ebma
tunes EBMA and generates weights for classifier averaging.
ebma(
ebma.fold,
y,
L1.x,
L2.x,
L2.unit,
L2.reg,
pc.names,
post.strat,
n.draws,
tol,
best.subset.opt,
pca.opt,
lasso.opt,
gb.opt,
svm.opt,
deep.mrp,
verbose,
cores
)
New data for EBMA tuning. A list containing the the data that must not have been used in classifier training.
Outcome variable. A character vector containing the column names of
the outcome variable. A character scalar containing the column name of
the outcome variable in survey
.
Individual-level covariates. A character vector containing the
column names of the individual-level variables in survey
and
census
used to predict outcome y
. Note that geographic unit
is specified in argument L2.unit
.
Context-level covariates. A character vector containing the
column names of the context-level variables in survey
and
census
used to predict outcome y
. To exclude context-level
variables, set L2.x = NULL
.
Geographic unit. A character scalar containing the column
name of the geographic unit in survey
and census
at which
outcomes should be aggregated.
Geographic region. A character scalar containing the column
name of the geographic region in survey
and census
by which
geographic units are grouped (L2.unit
must be nested within
L2.reg
). Default is NULL
.
Principal Component Variable names. A character vector containing the names of the context-level principal components variables.
Post-stratification results. A list containing the best models for each of the tuned classifiers, the individual level predictions on the data classifier trainig data and the post-stratified context-level predictions.
EBMA number of samples. An integer-valued scalar specifying
the number of bootstrapped samples to be drawn from the EBMA fold and used
for tuning EBMA. Default is \(100\). Passed on from ebma.n.draws
.
EBMA tolerance. A numeric vector containing the tolerance values
for improvements in the log-likelihood before the EM algorithm stops
optimization. Values should range at least from \(0.01\) to \(0.001\).
Default is c(0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, 0.00001)
.
Passed on from ebma.tol
.
Tuned best subset parameters. A list returned from
run_best_subset()
.
Tuned best subset with principal components parameters. A list
returned from run_pca()
.
Tuned lasso parameters. A list returned from
run_lasso()
.
Tuned gradient tree boosting parameters. A list returned from
run_gb()
.
Tuned support vector machine parameters. A list returned from
run_svm()
.
Deep MRP classifier. A logical argument indicating whether
the deep MRP classifier should be used for predicting outcome y
.
Default is FALSE
.
Verbose output. A logical argument indicating whether or not
verbose output should be printed. Default is FALSE
.
The number of cores to be used. An integer indicating the number of processor cores used for parallel computing. Default is 1.