inclass: Indirect Classification

Description

A framework for the indirect classification approach.

Usage

# S3 method for data.frame
inclass(formula, data, pFUN = NULL, cFUN = NULL, ...)

Value

An object of class inclass, consisting of a list of

model.intermediate: list of fitted models for each intermediate variable.
model.response: predictive model for the response variable.
para.intermediate: list, where each element is again a list and specifies the model for each intermediate variable.
para.response: a list which specifies the model for response variable.

Arguments

formula: formula. A formula specified as y~w1+w2+w3~x1+x2+x3 models each intermediate variable w1, w2, w3 by wi~x1+x2+x3 and the response by y~w1+w2+w3 if no other formulas are given in pFUN or cFUN.
data: data frame of explanatory, intermediate and response variables.
pFUN: list of lists, which describe models for the intermediate variables, see below for details.
cFUN: either a function or a list which describes the model for the response variable. The function has the argument newdata only.
...: additional arguments, passed to model fitting of the response variable.

Details

A given data set is subdivided into three types of variables: those to be used predicting the class (explanatory variables) those to be used defining the class (intermediate variables) and the class membership variable itself (response variable). Intermediate variables are modelled based on the explanatory variables, the class membership variable is defined on the intermediate variables.

Each specified intermediate variable is modelled separately following pFUN and a formula specified by formula. pFUN is a list of lists, the maximum length of pFUN is the number of intermediate variables. Each element of pFUN is a list with elements:
model - a function with arguments formula and data;
predict - an optional function with arguments object, newdata only, if predict is not specified, the predict method of model is used;
formula - specifies the formula for the corresponding model (optional), the formula described in y~w1+w2+w3~x1+x2+x3 is used if no other is specified.

The response is classified following cFUN, which is either a fixed function or a list as described below. The determined function cFUN assigns the intermediate (and explanatory) variables to a certain class membership, the list cFUN has the elements formula, model, predict and training.set. The elements formula, model, predict are structured as described by pFUN, the described model is trained on the original (intermediate variables) if training.set="original" or if training.set = NULL, on the fitted values if training.set = "fitted" or on observations not included in a specified subset if training.set = "subset".

A list of prediction models corresponding to each intermediate variable, a predictive function for the response, a list of specifications for the intermediate and for the response are returned.
For a detailed description on indirect classification see Hand et al. (2001).

References

David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised classification with structured class definitions. Computational Statistics & Data Analysis 36, 209--225.

Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller (2003), Diagnosis of glaucoma by indirect classifiers. Methods of Information in Medicine 1, 99-103.

Examples

Run this code

data("Smoking", package = "ipred")
# Set three groups of variables:
# 1) explanatory variables are: TarY, NicY, COY, Sex, Age
# 2) intermediate variables are: TVPS, BPNL, COHB
# 3) response (resp) is defined by:

classify <- function(data){
  data <- data[,c("TVPS", "BPNL", "COHB")]
  res <- t(t(data) > c(4438, 232.5, 58))
  res <- as.factor(ifelse(apply(res, 1, sum) > 2, 1, 0))
  res
}

response <- classify(Smoking[ ,c("TVPS", "BPNL", "COHB")])
smoking <- data.frame(Smoking, response)

formula <- response~TVPS+BPNL+COHB~TarY+NicY+COY+Sex+Age

inclass(formula, data = smoking, pFUN = list(list(model = lm, predict =
mypredict.lm)), cFUN = classify)

Run the code above in your browser using DataLab