discretization (version 1.0-1)

chi2: Discretization using the Chi2 algorithm

Description

This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure($\chi^2$) threshold that keeps the fidelity of the original numeric dataset.

Usage

chi2(data, alp = 0.5, del = 0.05)

Arguments

data
the dataset to be discretize
alp
significance level; $\alpha$
del
$Inconsistency(data)< \delta$, (Liu and Setiono(1995))

Value

cutp
list of cut-points for each variable
Disc.data
discretized data matrix

Details

The Chi2 algorithm is based on the $\chi^2$ statistic, and consists of two phases. In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed: phase 1. calculate the $\chi^2$ value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute); pahse 2. merge the pair of adjacent intervals with the lowest $\chi^2$ value. Merging continues until all pairs of intervals have $\chi^2$ values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate($\delta$), incon(), is exceeded in the discretized data(Liu and Setiono (1995)).

References

Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388--391.

Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642--645.

See Also

value, incon and chiM.

Examples

data(iris)
#---cut-points
chi2(iris,0.5,0.05)$cutp

#--discretized dataset using Chi2 algorithm
chi2(iris,0.5,0.05)$Disc.data