Dirac: Kernels for categorical variables

Description

From a matrix or data.frame with dimension NxD, where N>1, D>0, `Dirac()` computes the simplest kernel for categorical data. Samples should be in the rows and features in the columns. When there is a single feature, `Dirac()` returns 1 if the category (or class, or level) is the same in two given samples, and 0 otherwise. Instead, when D>1, the results for the D features are combined doing a sum, a mean, or a weighted mean.

Usage

Dirac(X, comp = "mean", coeff = NULL, feat_space = FALSE)

Value

Kernel matrix (dimension: NxN), or a list with the kernel matrix and the feature space.

Arguments

X

Matrix (class "character") or data.frame (class "character", or columns = "factor"). The elements in X are assumed to be categorical in nature.

comp

When D>1, this argument indicates how the variables of the dataset are combined. Options are: "mean", "sum" and "weighted". (Defaults: "mean")

"sum" gives the same importance to all variables, and returns an unnormalized kernel matrix.
"mean" gives the same importance to all variables, and returns a normalized kernel matrix (all its elements range between 0 and 1).
"weighted" weights each variable according to the `coeff` parameter, and returns a normalized kernel matrix.

coeff

(optional) A vector of weights with length D.

feat_space

If FALSE, only the kernel matrix is returned. Otherwise, the feature space is also returned. (Defaults: FALSE).

References

Belanche, L. A., and Villegas, M. A. (2013). Kernel functions for categorical variables with application to problems in the life sciences. Artificial Intelligence Research and Development (pp. 171-180). IOS Press. Link

Examples

Run this code

# Categorical data
summary(CO2)
Kdirac <- Dirac(CO2[,1:3])
## Display a subset of the kernel matrix:
Kdirac[c(1,15,50,65),c(1,15,50,65)]

Run the code above in your browser using DataLab