# bag

##### A General Framework For Bagging

`bag`

provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).

##### Usage

`bag(x, ...)`bagControl(fit = NULL, predict = NULL, aggregate = NULL,
downSample = FALSE, oob = TRUE, allowParallel = TRUE)

# S3 method for default
bag(x, y, B = 10, vars = ncol(x), bagControl = NULL,
...)

# S3 method for bag
predict(object, newdata = NULL, ...)

# S3 method for bag
print(x, ...)

# S3 method for bag
summary(object, ...)

# S3 method for summary.bag
print(x, digits = max(3, getOption("digits") - 3), ...)

ldaBag

plsBag

nbBag

ctreeBag

svmBag

nnetBag

##### Arguments

- x
a matrix or data frame of predictors

- …
arguments to pass to the model function

- fit
a function that has arguments

`x`

,`y`

and`...`

and produces a model object #' that can later be used for prediction. Example functions are found in`ldaBag`

,`plsBag`

, #'`nbBag`

,`svmBag`

and`nnetBag`

.- predict
a function that generates predictions for each sub-model. The function should have #' arguments

`object`

and`x`

. The output of the function can be any type of object (see the #' example below where posterior probabilities are generated. Example functions are found in`ldaBag`

#' ,`plsBag`

,`nbBag`

,`svmBag`

and`nnetBag`

.)- aggregate
a function with arguments

`x`

and`type`

. The function that takes the output #' of the`predict`

function and reduces the bagged predictions to a single prediction per sample. #' the`type`

argument can be used to switch between predicting classes or class probabilities for #' classification models. Example functions are found in`ldaBag`

,`plsBag`

,`nbBag`

, #'`svmBag`

and`nnetBag`

.- downSample
logical: for classification, should the data set be randomly sampled so that each #' class has the same number of samples as the smallest class?

- oob
logical: should out-of-bag statistics be computed and the predictions retained?

- allowParallel
a parallel backend is loaded and available, should the function use it?

- y
a vector of outcomes

- B
the number of bootstrap samples to train over.

- vars
an integer. If this argument is not

`NULL`

, a random sample of size`vars`

is taken of the predictors in each bagging iteration. If`NULL`

, all predictors are used.- bagControl
a list of options.

- object
an object of class

`bag`

.- newdata
a matrix or data frame of samples for prediction. Note that this argument must have a non-null value

- digits
minimal number of

*significant digits*.

##### Details

The function is basically a framework where users can plug in any model in to assess
the effect of bagging. Examples functions can be found in `ldaBag`

, `plsBag`

, `nbBag`

, `svmBag`

and `nnetBag`

.
Each has elements `fit`

, `pred`

and `aggregate`

.

One note: when `vars`

is not `NULL`

, the sub-setting occurs prior to the `fit`

and #' `predict`

functions are called. In this way, the user probably does not need to account for the #' change in predictors in their functions.

When using `bag`

with `train`

, classification models should use `type = "prob"`

#' inside of the `predict`

function so that `predict.train(object, newdata, type = "prob")`

will #' work.

If a parallel backend is registered, the foreach package is used to train the models in parallel.

##### Value

`bag`

produces an object of class `bag`

with elements

a list with two sub-objects: the `fit`

object has the actual model fit for that #' bagged samples and the `vars`

object is either `NULL`

or a vector of integers corresponding to which predictors were sampled for that model

a mirror of the arguments passed into `bagControl`

the call

the number of bagging iterations

the dimensions of the training set

##### Format

An object of class `list`

of length 3.

##### Examples

```
# NOT RUN {
## A simple example of bagging conditional inference regression trees:
data(BloodBrain)
## treebag <- bag(bbbDescr, logBBB, B = 10,
## bagControl = bagControl(fit = ctreeBag$fit,
## predict = ctreeBag$pred,
## aggregate = ctreeBag$aggregate))
## An example of pooling posterior probabilities to generate class predictions
data(mdrr)
## remove some zero variance predictors and linear dependencies
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]
## basicLDA <- train(mdrrDescr, mdrrClass, "lda")
## bagLDA2 <- train(mdrrDescr, mdrrClass,
## "bag",
## B = 10,
## bagControl = bagControl(fit = ldaBag$fit,
## predict = ldaBag$pred,
## aggregate = ldaBag$aggregate),
## tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))
# }
```

*Documentation reproduced from package caret, version 6.0-80, License: GPL (>= 2)*