arules (version 1.5-4)

discretize: Convert a Continuous Variable into a Categorical Variable

Description

This function implements several basic unsupervized methods to convert continuous variables into a categorical variables (factor) suitable for association rule mining.

Usage

discretize(x, method="interval", categories = 3, labels = NULL,     
  ordered=FALSE, onlycuts=FALSE, ...)

Arguments

x

a numeric vector (continuous variable).

method

discretization method. Available are: "interval" (equal interval width), "frequency" (equal frequency), "cluster" (k-means clustering) and "fixed" (categories specifies interval boundaries).

categories

number of categories or a vector with boundaries (all values outside the boundaries will be set to NA).

labels

character vector; names for categories.

ordered

logical; return a factor with ordered levels?

onlycuts

logical; return only computed interval boundaries?

for method "cluster" further arguments are passed on to kmeans.

Value

A factor representing the categorized continuous variable or, if onlycuts=TRUE, a vector with the interval boundaries.

Details

discretize only implements unsupervised discretization. See packages discretization or RWeka for supervised discretization.

Examples

Run this code
# NOT RUN {
data(iris)
x <- iris[,4]
hist(x, breaks=20, main="Data")

def.par <- par(no.readonly = TRUE) # save default
layout(mat=rbind(1:2,3:4))

### convert continuous variables into categories (there are 3 types of flowers)
### default is equal interval width
table(discretize(x, categories=3))
hist(x, breaks=20, main="Equal Interval length")
abline(v=discretize(x, categories=3, onlycuts=TRUE), 
col="red")

### equal frequency
table(discretize(x, "frequency", categories=3))

hist(x, breaks=20, main="Equal Frequency")
abline(v=discretize(x, method="frequency", categories=3, onlycuts=TRUE), 
col="red")

### k-means clustering 
table(discretize(x, "cluster", categories=3))
hist(x, breaks=20, main="K-Means")
abline(v=discretize(x, method="cluster", categories=3, onlycuts=TRUE), 
col="red")


### user-specified
table(discretize(x, "fixed", categories = c(-Inf,.8,Inf)))
table(discretize(x, "fixed", categories = c(-Inf,.8, Inf), 
    labels=c("small", "large")))
hist(x, breaks=20, main="Fixed")
abline(v=discretize(x, method="fixed", categories = c(-Inf,.8,Inf), 
    onlycuts=TRUE), col="red")

par(def.par)  # reset to default

### prepare the iris data set for association rule mining
for(i in 1:4) iris[,i] <- discretize(iris[,i],  "frequency", categories=3)

trans <- as(iris, "transactions")
inspect(head(trans, 1))

as(head(trans, 3),"matrix")
# }

Run the code above in your browser using DataCamp Workspace