boostBackend: Boost an estimation procedure with a reweighter and aggregator.

Description

Perform the Boost algorithm on proc with reweighter and aggregator and monitor estimator performance with analyzePerformance.

Usage

boostBackend(B, reweighter, aggregator, proc, data, initialWeights, .procArgs, analyzePerformance = defaultOOBPerformanceAnalysis, .reweighterArgs = NULL, .aggregatorArgs = NULL, .analyzePerformanceArgs = NULL, .subsetFormula = findFormulaIn(.procArgs), .formatData = !is.null(.subsetFormula), .storeData = FALSE, .calcBoostrPerformance = TRUE)

Arguments

the number of iterations to run.

reweighter

a boostr compatible reweighter function.

aggregator

a boostr compatible aggregator function.

proc

a boostr compatible estimation procedure.

data

the learning set to pass to proc. data is assumed to hold the response variable in its first column.

initialWeights

a vector of weights used for the first iteration of the ensemble building phase of Boost.

.procArgs

a named list of arguments to pass to proc in addition to data.

.reweighterArgs

a named list of arguments to pass to reweighter in addition to proc, data and weights. These are generally initialization values for other parameters that govern the behaviour of reweighter.

.aggregatorArgs

a named list of arguments to pass to aggregator in addition to the output from reweighter.

.storeData

a boolean indicating whether the data should be stored in the returned boostr object under the attribute "data".

.calcBoostrPerformance

a boolean indicating whether analyzePerformance should be used to monitor the performance of the returned boostr object on the learning set. A value of seq.int(nrow(data)) will be passed to analyzePerformance as the oobObs argument.

.subsetFormula

a formula object indicating how data is to be subsetted. A formula of like "Type ~ ." will rearrange the columns of data such that data[,1] == data$Type. By default, this value is taken to be the value of the formula entry in .procArgs. If multiple entries have the substring "formula" in their names, the search will throw an error and you're advised to manually set .subsetFormula.

.formatData

a boolean indicating whether the data needs to be reformatted via .subsetFormula such that the response variable is in the first column and the remaining columns are all predictor variables. This is defaulted to !is.null(.subsetFormula).

analyzePerformance

a boostr compatible performance analyzer.

.analyzePerformanceArgs

a named list arguments to pass to analyzePerformance in addition to prediction, response, and oobPbs.

Value

ensembleEstimators: An ordered list whose components are the trained estimators.
reweighterOutput: An ordered list whose components are the output of reweighter at each iteration.
performanceOnLearningSet: The performance of the returned boostr object on the learning set, as measure by analyzePerformance. This is only calculated if .calcBoostrPerformance=TRUE
estimatorPerformance: An ordered list whose components are the output of analyzePerformance at each iteration.
oobVec: A row-major matrix whose $ij$-th entry indicates if observation $j$ was used to train estimator $i$.
reweighter: The reweighter function used.
reweighterArgs: Any additional arguments passed to boostBackend for reweighter.
aggregator: The aggregator function used.
aggregatorArgs: Any additional arguments passed to boostBackend for aggregator.
estimationProcedure: The estimation procedure used.
estimationProcedureArgs: Any additional arguments passed to boostBackend for proc.
data: The learning set. Only stored if .storeData = TRUE.
analyzePerformance: The performance analyzer used.
analyzePerformanceArgs: Any additional arguments passed to boostBackend for analyzePerformance.
subsetFormula: The value of .subsetFormula.
formatData: The value of .formatData.
storeData: The value of .storeData.
calcBoostrPerformance: The value of .calcBoostrPerformance
initialWeights: The initial weights used.

Details

For the details behind this algorithm, check out the paper at http://pollackphoto.net/misc/masters_thesis.pdf

References

Steven Pollack. (2014). Boost: a practical generalization of AdaBoost (Master's Thesis). http://pollackphoto.net/misc/masters_thesis.pdf

Examples

Run this code

## Not run: 
# df <- within(iris, {
#               Setosa <- factor(2*as.numeric(Species == "setosa") - 1)
#               Species <- NULL
#              })
# 
# form <- formula(Setosa ~ . )
# df <- model.frame(formula=form, data=df)
# 
# # demonstrate arc-fs algorithm using boostr convenience functions
# 
# glmArgs <- list(.trainArgs=list(formula=form, family="binomial"))
# 
# # format prediction to yield response in {-1,1} instead of {0,1}
# glm_predict <- function(object, newdata) {
#   2*round(predict(object, newdata, type='response')) - 1
#   }
# 
# Phi_glm <- buildEstimationProcedure(train=glm, predict=glm_predict)
# 
# phi <- boostBackend(B=3, data=df,
#                      reweighter=adaboostReweighter,
#                      aggregator=adaboostAggregator,
#                      proc=Phi_glm,
#                      .procArgs=glmArgs)
# ## End(Not run)

Run the code above in your browser using DataLab