chi2: Discretization using the Chi2 algorithm

Description

This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure(\(\chi^2\)) threshold that keeps the fidelity of the original numeric dataset.

Usage

chi2(data, alp = 0.5, del = 0.05)

Arguments

data

the dataset to be discretize

alp

significance level; \(\alpha\)

del

\(Inconsistency(data)< \delta\), (Liu and Setiono(1995))

Value

cutp

list of cut-points for each variable

Disc.data

discretized data matrix

Details

The Chi2 algorithm is based on the \(\chi^2\) statistic, and consists of two phases. In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed: phase 1. calculate the \(\chi^2\) value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute); pahse 2. merge the pair of adjacent intervals with the lowest \(\chi^2\) value. Merging continues until all pairs of intervals have \(\chi^2\) values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate(\(\delta\)), incon(), is exceeded in the discretized data(Liu and Setiono (1995)).

References

Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388--391.

Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642--645.

Examples

Run this code

# NOT RUN {
data(iris)
#---cut-points
chi2(iris,0.5,0.05)$cutp

#--discretized dataset using Chi2 algorithm
chi2(iris,0.5,0.05)$Disc.data
# }

Run the code above in your browser using DataLab