Learn R Programming

boostr (version 1.0.0)

boostBackend: Boost an estimation procedure with a reweighter and aggregator.

Description

Perform the Boost algorithm on proc with reweighter and aggregator and monitor estimator performance with analyzePerformance.

Usage

boostBackend(B, reweighter, aggregator, proc, data, initialWeights, .procArgs, analyzePerformance = defaultOOBPerformanceAnalysis, .reweighterArgs = NULL, .aggregatorArgs = NULL, .analyzePerformanceArgs = NULL, .subsetFormula = findFormulaIn(.procArgs), .formatData = !is.null(.subsetFormula), .storeData = FALSE, .calcBoostrPerformance = TRUE)

Arguments

B
the number of iterations to run.
reweighter
a boostr compatible reweighter function.
aggregator
a boostr compatible aggregator function.
proc
a boostr compatible estimation procedure.
data
the learning set to pass to proc. data is assumed to hold the response variable in its first column.
initialWeights
a vector of weights used for the first iteration of the ensemble building phase of Boost.
.procArgs
a named list of arguments to pass to proc in addition to data.
.reweighterArgs
a named list of arguments to pass to reweighter in addition to proc, data and weights. These are generally initialization values for other parameters that govern the behaviour of reweighter.
.aggregatorArgs
a named list of arguments to pass to aggregator in addition to the output from reweighter.
.storeData
a boolean indicating whether the data should be stored in the returned boostr object under the attribute "data".
.calcBoostrPerformance
a boolean indicating whether analyzePerformance should be used to monitor the performance of the returned boostr object on the learning set. A value of seq.int(nrow(data)) will be passed to analyzePerformance as the oobObs argument.
.subsetFormula
a formula object indicating how data is to be subsetted. A formula of like "Type ~ ." will rearrange the columns of data such that data[,1] == data$Type. By default, this value is taken to be the value of the formula entry in .procArgs. If multiple entries have the substring "formula" in their names, the search will throw an error and you're advised to manually set .subsetFormula.
.formatData
a boolean indicating whether the data needs to be reformatted via .subsetFormula such that the response variable is in the first column and the remaining columns are all predictor variables. This is defaulted to !is.null(.subsetFormula).
analyzePerformance
a boostr compatible performance analyzer.
.analyzePerformanceArgs
a named list arguments to pass to analyzePerformance in addition to prediction, response, and oobPbs.

Value

a "boostr" object. The returned closure is the output of aggregator on the collection of estimators built during the iterative phase of Boost. This is intended to be a new estimator, and hence accepts the argument newdata. However, the estimator also has attributes
ensembleEstimators
An ordered list whose components are the trained estimators.
reweighterOutput
An ordered list whose components are the output of reweighter at each iteration.
performanceOnLearningSet
The performance of the returned boostr object on the learning set, as measure by analyzePerformance. This is only calculated if .calcBoostrPerformance=TRUE
estimatorPerformance
An ordered list whose components are the output of analyzePerformance at each iteration.
oobVec
A row-major matrix whose $ij$-th entry indicates if observation $j$ was used to train estimator $i$.
reweighter
The reweighter function used.
reweighterArgs
Any additional arguments passed to boostBackend for reweighter.
aggregator
The aggregator function used.
aggregatorArgs
Any additional arguments passed to boostBackend for aggregator.
estimationProcedure
The estimation procedure used.
estimationProcedureArgs
Any additional arguments passed to boostBackend for proc.
data
The learning set. Only stored if .storeData = TRUE.
analyzePerformance
The performance analyzer used.
analyzePerformanceArgs
Any additional arguments passed to boostBackend for analyzePerformance.
subsetFormula
The value of .subsetFormula.
formatData
The value of .formatData.
storeData
The value of .storeData.
calcBoostrPerformance
The value of .calcBoostrPerformance
initialWeights
The initial weights used.
The attributes can be accessed through the appropropriate extraction function.

Details

For the details behind this algorithm, check out the paper at http://pollackphoto.net/misc/masters_thesis.pdf

References

Steven Pollack. (2014). Boost: a practical generalization of AdaBoost (Master's Thesis). http://pollackphoto.net/misc/masters_thesis.pdf

Examples

Run this code
## Not run: 
# df <- within(iris, {
#               Setosa <- factor(2*as.numeric(Species == "setosa") - 1)
#               Species <- NULL
#              })
# 
# form <- formula(Setosa ~ . )
# df <- model.frame(formula=form, data=df)
# 
# # demonstrate arc-fs algorithm using boostr convenience functions
# 
# glmArgs <- list(.trainArgs=list(formula=form, family="binomial"))
# 
# # format prediction to yield response in {-1,1} instead of {0,1}
# glm_predict <- function(object, newdata) {
#   2*round(predict(object, newdata, type='response')) - 1
#   }
# 
# Phi_glm <- buildEstimationProcedure(train=glm, predict=glm_predict)
# 
# phi <- boostBackend(B=3, data=df,
#                      reweighter=adaboostReweighter,
#                      aggregator=adaboostAggregator,
#                      proc=Phi_glm,
#                      .procArgs=glmArgs)
# ## End(Not run)

Run the code above in your browser using DataLab