Learn R Programming

catnet (version 1.00.0)

cnDiscretize: Data Categorization

Description

Numerical data discretization using empirical quantiles.

Usage

cnDiscretize(data, numCategories)

Arguments

data
a numerical matrix or data.frame
numCategories
an integer, the number of categories per node

Value

  • A matrix or data.frame of indices.

Details

The numerical data is discretized into given number of categories, numCategories, using the empirical node quantiles. As in all functions of catnet package that accept data, if the data parameter is a matrix then it is organized in the row-node format. If it is a data.frame, the column-node format is assumed.

For a given set of numbers in some range interval, the quantiles break that range into sub-intervals, each containing equal count of the initial numbers. A quantile-based discretization method is used as follows. For each node, the sample node distribution is constructed, which is then represented by a sum of non-intersecting classes separated by the quantile points of the sample distribution. Each node value is assigned the class index in which it falls into.

Currently, the function assigns equal number of categories for each node of the data.

See Also

cnSamples