This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure(\(\chi^2\)) threshold that keeps the fidelity of the original numeric dataset.
Usage
chi2(data, alp = 0.5, del = 0.05)
Arguments
data
the dataset to be discretize
alp
significance level; \(\alpha\)
del
\(Inconsistency(data)< \delta\), (Liu and Setiono(1995))
Value
cutp
list of cut-points for each variable
Disc.data
discretized data matrix
Details
The Chi2 algorithm is based on the \(\chi^2\) statistic, and consists of two phases.
In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed:
phase 1. calculate the \(\chi^2\) value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute);
pahse 2. merge the pair of adjacent intervals with the lowest \(\chi^2\) value. Merging continues until all pairs of intervals have \(\chi^2\) values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate(\(\delta\)), incon(), is exceeded in the discretized data(Liu and Setiono (1995)).
References
Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388--391.
Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642--645.