# bagging

##### Bagging Classification and Regression Trees

Bootstrap aggregated classification and regression trees.

- Keywords
- tree

##### Usage

```
## S3 method for class 'default':
bagging(y, X=NULL, nbagg=25, method=c("standard","double"),
coob=TRUE, control= rpart.control(minsize=2, cp=0), ...)
## S3 method for class 'formula':
bagging(formula, data, subset, na.action=na.rpart, \dots)
```

##### Arguments

- y
- vector of responses: either numerical (regression) or factors (classification).
- X
- data frame of predictors.
- nbagg
- number of bootstrap replications.
- method
`standard`

for Bagging and`double`

for Double-Bagging.- coob
- logical. Compute an out-of-bag estimate of the misclassification or mean-squared error.
- control
- options that control details of the
`rpart`

algorithm, see`rpart.control`

. - formula
- formula describing the model:
`y ~ x + w + z`

, where`y`

is the response and`x,w,z`

are predictors, see`lm`

for details. - data
- optional data frame containing the variables in the model formula.
- subset
- optional vector specifying a subset of observations to be used.
- na.action
- function which indicates what should happen when
the data contain
`NA`

s. Defaults to`na.rpart`

. - ...
- additional parameters to methods (e.g.
`rpart`

).

##### Details

Bootstrap aggregated classification and regression trees were suggested by
Breiman (1996, 1998) in order to stabilise trees. This function
is based on trees computed by `rpart`

. If `y`

is a
factor, classification trees are constructed, regression trees otherwise.
`nbagg`

bootstrap samples are drawn and a tree is constructed
for each of them. If `coob`

is TRUE, the out-of-bag sample is
used to estimate the prediction error. Double-Bagging (Hothorn and Lausen,
2002) computes a LDA on the out-of-bag sample and uses the discriminant
variables as additional predictors for the classification trees. Therefore,
an out-of-bag estimate of misclassification error is not available for
`method="double"`

.

`print.bagging`

and
`summary.bagging`

are available for the inspection of the
results as well as `predict.bagging`

for prediction.
Additionally, the function `prune.bagging`

can be used to prune
each of the `nbagg`

trees. By default, the trees are not pruned and the
tree growing is not stopped until the nodes are pure.

##### Value

- An object of class
`bagging`

: a list containing the following objects mt list of length `nbagg`

containing`rpart`

trees.oob out-of-bag predictions for each observation. err out-of-bag error estimate. nbagg number of bootstrap samples and trees used. method method used. ldasc discriminant functions of LDA (for Double-Bagging only).

##### References

Leo Breiman (1996), Bagging Predictors. *Machine Learning*
**24**(2), 123--140.

Leo Breiman (1998), Arcing Classifiers. *The Annals of Statistics*
**26**(3), 801--824.

Torsten Hothorn and Berthold Lausen (2002), Double-Bagging: Combining
classifiers by bootstrap aggregation. *submitted*,
preprint available under

##### Examples

```
X <- as.data.frame(matrix(rnorm(1000), ncol=10))
y <- factor(ifelse(apply(X, 1, mean) > 0, 1, 0))
learn <- cbind(y, X)
mt <- bagging(y ~., data=learn, coob=TRUE)
mt
X <- as.data.frame(matrix(rnorm(1000), ncol=10))
y <- factor(ifelse(apply(X, 1, mean) > 0, 1, 0))
cls <- predict(mt, newdata=X)
cat("Misclass error est: ", mean(y != cls), "")
cat("Misclass error oob: ", mt$err, "")
X <- as.data.frame(matrix(rnorm(1000), ncol=10))
y <- apply(X, 1, mean) + rnorm(nrow(X))
learn <- cbind(y, X)
mt <- bagging(y ~., data=learn, coob=TRUE)
mt
X <- as.data.frame(matrix(rnorm(1000), ncol=10))
y <- apply(X, 1, mean) + rnorm(nrow(X))
haty <- predict(mt, newdata=X)
cat("MSE error: ", mean((haty - y)^2) , "")
data(BreastCancer)
BreastCancer$Id <- NULL
# Test set error bagging (nbagg = 50): 3.7\% (Breiman, 1998, Table 5)
bagging(Class ~ Cl.thickness + Cell.size
+ Cell.shape + Marg.adhesion
+ Epith.c.size + Bare.nuclei
+ Bl.cromatin + Normal.nucleoli
+ Mitoses, data=BreastCancer, coob=TRUE)
```

