Learn R Programming

kmed (version 0.0.1)

distmix: A distance for mixed variables.

Description

This function computes and returns the distance matrix computed by using the specified distance measure to compute the mixed variable data.

Usage

distmix(data, method = "gower", idnum = NULL, idbin = NULL,
  idcat = NULL)

Arguments

data

A data frame or a matrix object.

method

A distance for mixed variables: "gower", "wishart", "podani", "huang", and "harikumar".

idnum

A vector of index of numerical variables.

idbin

A vector of index of binary variables.

idcat

A vector of index of categorical variables.

Details

This is a function to compute distance of mixed variable data. It returns a matrix of all object distances. The available distance are Gower ("gower"), Wishart ("wishart"), Podani ("podani"), Huang ("huang"), and Harikumar-PV ("harikumar"). Because it computes distance of mixed variable data, at least two different class of variables in idnum, idbin, or idcat must be supplied, such as numerical and binary or binary and categorical indices.

References

Gower, J., 1971. A general coefficient of similarity and some of its properties. Biometrics 27, 857-871

Harikumar, S., PV, S., 2015. K-medoid clustering for heterogeneous data sets. JProcedia Computer Science 70, 226-237.

Huang, Z., 1997. Clustering large data sets with mixed numeric and categorical values, in: The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 21-34.

Podani, J., 1999. Extending gower's general coefficient of similarity to ordinal characters. Taxon 48, 331-340.

Wishart, D., 2003. K-means clustering with outlier detection, mixed variables and missing values, in: Exploratory Data Analysis in Empirical Research: Proceedings of the 25th Annual Conference of the Gesellschaft fur Klassifikation e.V., University of Munich, March 14-16, 2001, Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 216-226.

Examples

Run this code
# NOT RUN {
set.seed(1)
a <- matrix(sample(1:2, 7*3, replace = TRUE), 7, 3)
a1 <- matrix(sample(1:3, 7*3, replace = TRUE), 7, 3)
mixdata <- cbind(iris[1:7,1:3], a, a1)
colnames(mixdata) <- c(paste(c("num"), 1:3, sep = ""),
                       paste(c("bin"), 1:3, sep = ""),
                       paste(c("cat"), 1:3, sep = ""))
distmix(mixdata, method = "gower", idnum = 1:3, idbin = 4:6, idcat = 7:9)

# }

Run the code above in your browser using DataLab