# bag.default

##### A General Framework For Bagging

`bag`

provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).

- Keywords
- models

##### Usage

`bag(x, ...)`## S3 method for class 'default':
bag(x, y, B = 10, vars = NULL, bagControl = bagControl(), ...)

bagControl(fit = NULL,
predict = NULL,
aggregate = NULL)

## S3 method for class 'bag':
predict(object, newdata = NULL, ...)

##### Arguments

- x
- a matrix or data frame of predictors
- y
- a vector of outcomes
- B
- the number of bootstrap samples to train over.
- bagControl
- a list of options.
- ...
- arguments to pass to the model function
- fit
- a function that has arguments
`x`

,`y`

and`...`

and produces a model object that can later be used for prediction - predict
- a function that generates predictions for each sub-model. The function should have arguments
`object`

and`x`

. The output of the function can be any type of object (see the example below where posterior probabilities are generated) - aggregate
- a function with arguments
`x`

and`type`

. The function that takes the output of the`predict`

function and reduces the bagged predictions to a single prediction per sample. the`type`

argument can be used to swi - vars
- an integer. If this argument is not
`NULL`

, a random sample of size`vars`

is taken of the predictors in each bagging iteration. If`NULL`

, all predictors are used. - object
- an object of class
`bag`

. - newdata
- a matrix or data frame of samples for prediction. Note that this argument must have a non-null value

##### Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging.

One note: when `vars`

is not `NULL`

, the sub-setting occurs prior to the `fit`

and `predict`

functions are called. In this way, the user probably does not need to account for the change in predictors in their functions.

When using `bag`

with `train`

, classification models should use `type = "prob"`

inside of the `predict`

function so that `predict.train(object, newdata, type = "prob")`

will work.

##### Value

`bag`

produces an object of class`bag`

with elementsfits a list with two sub-objects: the `fit`

object has the actual model fit for that bagged samples and the`vars`

object is either`NULL`

or a vector of integers corresponding to which predictors were sampled for that modelcontrol a mirror of the arguments passed into `bagControl`

call the call B the number of bagging iterations dims the dimensions of the training set

##### Examples

```
## A simple example of bagging conditional inference regression trees:
data(BloodBrain)
## Fit a model with the default values
ctreeFit <- function(x, y, ...)
{
library(party)
data <- as.data.frame(x)
data$y <- y
ctree(y~., data = data)
}
## Generate simple predictions of the outcome
ctreePred <- function(object, x)
{
predict(object, x)[,1]
}
## Take the median of the bagged predictions
ctreeAg <- function(x, type = NULL)
{
## x is a list of vectors, so we convert them to a matrix
preds <- do.call("cbind", x)
apply(preds, 1, median)
}
treebag <- bag(bbbDescr, logBBB, B = 10,
bagControl = bagControl(fit = ctreeFit,
predict = ctreePred,
aggregate = ctreeAg))
## An example of pooling posterior probabilities to generate class predictions
data(mdrr)
## remove some zero variance predictors and linear dependencies
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]
## The fit and predict functions are stright-forward:
ldaFit <- function(x, y, ...)
{
library(MASS)
lda(x, y, ...)
}
ldaPred <- function(object, x)
{
predict(object, x)$posterior
}
## For the aggregation function, we take the median of the bagged
## posterior probabilities and pick the largest as the class
ldaAg <- function(x, type = "class")
{
## The class probabilities come in as a list of matrices
## For each class, we can pool them then average over them
pooled <- x[[1]] & NA
classes <- colnames(pooled)
for(i in 1:ncol(pooled))
{
tmp <- lapply(x, function(y, col) y[,col], col = i)
tmp <- do.call("rbind", tmp)
pooled[,i] <- apply(tmp, 2, median)
}
if(type == "class")
{
out <- factor(classes[apply(pooled, 1, which.max)],
levels = classes)
} else out <- pooled
out
}
bagLDA <- bag(mdrrDescr, mdrrClass,
B = 10,
vars = 10,
bagControl = bagControl(fit = ldaFit,
predict = ldaPred,
aggregate = ldaAg))
basicLDA <- train(mdrrDescr, mdrrClass, "lda")
bagLDA2 <- train(mdrrDescr, mdrrClass,
"bag",
B = 10,
bagControl(fit = ldaFit,
predict = ldaPred,
aggregate = ldaAg),
tuneGrid = data.frame(.vars = c((1:10)*10 , ncol(mdrrDescr))))
```

*Documentation reproduced from package caret, version 4.69, License: GPL-2*