Discretization using the Chi2 algorithm
This function performs Chi2 discretization algorithm. Chi2 algorithm automatically determines a proper Chi-sqaure($\chi^2$) threshold that keeps the fidelity of the original numeric dataset.
chi2(data, alp = 0.5, del = 0.05)
the dataset to be discretize
significance level; $\alpha$
$Inconsistency(data)< \delta$, (Liu and Setiono(1995))
- list of cut-points for each variable
- discretized data matrix
The Chi2 algorithm is based on the $\chi^2$ statistic, and consists of two phases.
In the first phase, it begins with a high significance level(sigLevel), for all numeric attributes for discretization. Each attribute is sorted according to its values. Then the following is performed:
phase 1. calculate the $\chi^2$ value for every pair of adjacent intervals (at the beginning, each pattern is put into its own interval that contains only one value of an attribute);
pahse 2. merge the pair of adjacent intervals with the lowest $\chi^2$ value. Merging continues until all pairs of intervals have $\chi^2$ values exceeding the parameter determined by sigLevel. The above process is repeated with a decreased sigLevel until an inconsistency rate($\delta$),
incon(), is exceeded in the discretized data(Liu and Setiono (1995)).
Liu, H. and Setiono, R. (1995). Chi2: Feature selection and discretization of numeric attributes, Tools with Artificial Intelligence, 388--391.
Liu, H. and Setiono, R. (1997). Feature selection and discretization, IEEE transactions on knowledge and data engineering, Vol.9, no.4, 642--645.
#--discretized dataset using Chi2 algorithm