# discretize

0th

Percentile

##### Convert a Continuous Variable into a Categorical Variable

This function implements several basic unsupervized methods to convert continuous variables into a categorical variables (factor) suitable for association rule mining.

Keywords
manip
##### Usage
discretize(x, method="interval", categories = 3, labels = NULL,
ordered=FALSE, onlycuts=FALSE, ...)
##### Arguments
x

a numeric vector (continuous variable).

method

discretization method. Available are: "interval" (equal interval width), "frequency" (equal frequency), "cluster" (k-means clustering) and "fixed" (categories specifies interval boundaries).

categories

number of categories or a vector with boundaries (all values outside the boundaries will be set to NA).

labels

character vector; names for categories.

ordered

logical; return a factor with ordered levels?

onlycuts

logical; return only computed interval boundaries?

for method "cluster" further arguments are passed on to kmeans.

##### Details

discretize only implements unsupervised discretization. See packages discretization or RWeka for supervised discretization.

##### Value

A factor representing the categorized continuous variable or, if onlycuts=TRUE, a vector with the interval boundaries.

• discretize
##### Examples
# NOT RUN {
data(iris)
x <- iris[,4]
hist(x, breaks=20, main="Data")

def.par <- par(no.readonly = TRUE) # save default
layout(mat=rbind(1:2,3:4))

### convert continuous variables into categories (there are 3 types of flowers)
### default is equal interval width
table(discretize(x, categories=3))
hist(x, breaks=20, main="Equal Interval length")
abline(v=discretize(x, categories=3, onlycuts=TRUE),
col="red")

### equal frequency
table(discretize(x, "frequency", categories=3))

hist(x, breaks=20, main="Equal Frequency")
abline(v=discretize(x, method="frequency", categories=3, onlycuts=TRUE),
col="red")

### k-means clustering
table(discretize(x, "cluster", categories=3))
hist(x, breaks=20, main="K-Means")
abline(v=discretize(x, method="cluster", categories=3, onlycuts=TRUE),
col="red")

### user-specified
table(discretize(x, "fixed", categories = c(-Inf,.8,Inf)))
table(discretize(x, "fixed", categories = c(-Inf,.8, Inf),
labels=c("small", "large")))
hist(x, breaks=20, main="Fixed")
abline(v=discretize(x, method="fixed", categories = c(-Inf,.8,Inf),
onlycuts=TRUE), col="red")

par(def.par)  # reset to default

### prepare the iris data set for association rule mining
for(i in 1:4) iris[,i] <- discretize(iris[,i],  "frequency", categories=3)

trans <- as(iris, "transactions")
inspect(head(trans, 1))

as(head(trans, 3),"matrix")
# }

Documentation reproduced from package arules, version 1.5-4, License: GPL-3

### Community examples

Looks like there are no examples yet.