woe

0th

Percentile

Weights of evidence

Computes weight of evidence transform of factor variables for binary classification.

Keywords
multivariate, classif
Usage
woe(x, ...)
## S3 method for class 'default':
woe(x, grouping, zeroadj = 0, ids = NULL, 
                      appont = TRUE, ...)
## S3 method for class 'formula':
woe(formula, data = NULL, ...)
Arguments
x
A matrix or data frame containing the explanatory variables.
grouping
A factor specifying the binary class for each observation.
formula
A formula of the form grouping ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the discriminators.
data
Data frame from which variables specified in formula are to be taken.
zeroadj
Additive constant to be added for a level with 0 observations in a class.
ids
Vector of either indices or variable names that specifies the variables to be transformed.
appont
Application on training data: logical indicating whether the transformed values for the training data should be returned by recursive calling of predict.woe.
...
For woe.formula: Further arguments passed to function woe.default such as ids. For woe.default: replace = FALSE can be passed to recursive call of predict.woe if appont
Details

To each factor level $x$ a numeric value $WOE(x) = ln(f(x|1)/f(x|2))$ is assigned where 1 and 2 denote the class labels. The WOE transform is motivated for subsequent modelling by logistic regression. Note that the frequencies of the classes should be investigated before. Information values heuristically quantify the discriminatory power of a variable by $IV = (f(x|1)-f(x|2)) ln(f(x|1)/f(x|2))$.

Value

  • Returns an object of class woe that can be applied to new data.
  • woeWOE coefficients for factor2numeric transformation of each (specified) variable.
  • IVVector of information values of all transformed variables.
  • newxData frame of transformed data if appont.

References

Good, I. (1950): Probability and the Weighting of Evidences. Charles Griffin, London. Kullback, S. (1959): Information Theory and Statistics. Wiley, New York.

See Also

predict.woe, plot.woe

Aliases
  • woe
  • woe.default
  • woe.formula
  • print.woe
Examples
## load German credit data
data("GermanCredit")

## training/validation split
train <- sample(nrow(GermanCredit), round(0.6*nrow(GermanCredit)))

woemodel <- woe(credit_risk~., data = GermanCredit[train,], zeroadj=0.5, applyontrain = TRUE)
woemodel

## plot variable information values and woes
plot(woemodel)
plot(woemodel, type = "woes")

## apply woes 
traindata <- predict(woemodel, GermanCredit[train,], replace = TRUE)
str(traindata)

## fit logistic regression model
glmodel     <- glm(credit_risk~., traindata, family=binomial)
summary(glmodel)
pred.trn <- predict(glmodel, traindata, type = "response")

## predict validation data
validata <- predict(woemodel, GermanCredit[-train,], replace = TRUE)
pred.val <- predict(glmodel, validata, type = "response")
Documentation reproduced from package klaR, version 0.6-11, License: GPL-2

Community examples

Looks like there are no examples yet.