inbagg: Indirect Bagging

Description

Function to perform the indirect bagging and subagging.

Usage

"inbagg"(formula, data, pFUN=NULL,  cFUN=list(model = NULL, predict = NULL, training.set = NULL),  nbagg = 25, ns = 0.5, replace = FALSE, ...)

Arguments

formula

formula. A formula specified as y~w1+w2+w3~x1+x2+x3 describes how to model the intermediate variables w1, w2, w3 and the response variable y, if no other formula is specified by the elements of pFUN or in cFUN

data

data frame of explanatory, intermediate and response variables.

pFUN

list of lists, which describe models for the intermediate variables, details are given below.

cFUN

either a fixed function with argument newdata and returning the class membership by default, or a list specifying a classifying model, similar to one element of pFUN. Details are given below.

nbagg

number of bootstrap samples.

proportion of sample to be drawn from the learning sample. By default, subagging with 50% is performed, i.e. draw 0.5*n out of n without replacement.

replace

logical. Draw with or without replacement.

...

additional arguments (e.g. subset).

Value

mtrees: a list of length nbagg, describing the prediction models corresponding to each bootstrap sample. Each element of mtrees is a list with elements bindx (observations of bag sample), btree (classifying function of bag sample) and bfct (predictive models for intermediates of bag sample).
y: vector of response values.
W: data frame of intermediate variables.
X: data frame of explanatory variables.

Details

A given data set is subdivided into three types of variables: explanatory, intermediate and response variables.

Here, each specified intermediate variable is modelled separately following pFUN, a list of lists with elements specifying an arbitrary number of models for the intermediate variables and an optional element training.set = c("oob", "bag", "all"). The element training.set determines whether, predictive models for the intermediate are calculated based on the out-of-bag sample ("oob"), the default, on the bag sample ("bag") or on all available observations ("all"). The elements of pFUN, specifying the models for the intermediate variables are lists as described in inclass. Note that, if no formula is given in these elements, the functional relationship of formula is used.

The response variable is modelled following cFUN. This can either be a fixed classifying function as described in Peters et al. (2003) or a list, which specifies the modelling technique to be applied. The list contains the arguments model (which model to be fitted), predict (optional, how to predict), formula (optional, of type y~w1+w2+w3+x1+x2 determines the variables the classifying function is based on) and the optional argument training.set = c("fitted.bag", "original", "fitted.subset") specifying whether the classifying function is trained on the predicted observations of the bag sample ("fitted.bag"), on the original observations ("original") or on the predicted observations not included in a defined subset ("fitted.subset"). Per default the formula specified in formula determines the variables, the classifying function is based on.

Note that the default of cFUN = list(model = NULL, training.set = "fitted.bag") uses the function rpart and the predict function predict(object, newdata, type = "class").

References

David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised classification with structured class definitions. Computational Statistics & Data Analysis 36, 209--225.

Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller (2003), Diagnosis of glaucoma by indirect classifiers. Methods of Information in Medicine 1, 99-103.

Examples

Run this code


library("MASS")
library("rpart")
y <- as.factor(sample(1:2, 100, replace = TRUE))
W <- mvrnorm(n = 200, mu = rep(0, 3), Sigma = diag(3))
X <- mvrnorm(n = 200, mu = rep(2, 3), Sigma = diag(3))
colnames(W) <- c("w1", "w2", "w3") 
colnames(X) <- c("x1", "x2", "x3") 
DATA <- data.frame(y, W, X)


pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm),
list(model = rpart))

inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)

Run the code above in your browser using DataLab