mrfDepth (version 1.0.12)

distSpace: distSpace

Description

Calculation of distance space representation.

Usage

distSpace(trainingData, testData = NULL, type = "bagdistance", options = NULL)

Arguments

trainingData

A list of nxp matrices containing the multivariate data or a list of \(t\) by \(n\) by \(p\) arrays containing the functional data.

testData

An mxp matrix containing all multivariate training data or a \(t\) by \(m\) by \(p\) array for functional data.

type

The distance used in the computations. For multivariate data one of the following options: "bagdistance", "outlyingness", or "adjOutl". For functional data one of the following options: "fBD", "fSDO" or "fAO" Defaults to "bagdistance".

options

A list of options to pass to the function calculating the underlying distance. See "bagdistance", "outlyingness", "adjOutl" or fOutl for more information.

Value

A matrix q x (p+1) composed of two blocks. The first block contains in each row an observations in the training set with in each column the distance to each of the training sets. The last column contains a label indicating the original group membership of the observation. The second block contains the observations in the test set, if any, with in each column the distance to the different training sets. The last column now contains an inddicator signalling the observations was part of the test set.

Details

The distance is a tool in supervised classification and was introduced in Hubert et al. (2016) as a generalisation of the depth-depth representation of a multivariate sample. Based on a distance transform, an observation (be it multivariate or functional) is mapped to its representation in distance space. The distance transformation consists of mapping the observation to a vector containing at coordinate \(i\) the distance to the training group \(i\). After transformation, any multivariate classifier may be used to classify new observations in distance space. Typically the \(k\)-nearest neighbour algorithm is used.

Different options are available to calculate the distance to each of the training groups. For multivariate data, the user may choose between the bagdistance or any of the projection type distance including the Stahel-Donoho outlyingness, the adjusted outlyingness or the directional outlyingness. For functional data, the user may opt to employ the functional bagdistance (fBD), the functional Stahel-Donoho (fSDO)the functional adjusted-outlyingness (fAO) or the functional directional outlyingness (fDO). Options to available in each of the underlying distance routines may be passed down using the options argument.

References

Hubert M., Rousseeuw P.J., Segaert P. (2017). Multivariate and functional classification using depth and distance. Advances in Data Analysis and Classification, 11(3), 445-466.

Examples

Run this code
# NOT RUN {
# We will use two multivariate toy data sets
data(cardata90)
data(bloodfat)

# Build the training data
trainingData <- list(set1 = cardata90,
                     set2 = bloodfat)
# Transform the data into distspace
Result <- distSpace(trainingData = trainingData)
# Plot the results
plotColors <- c(rep("orange", nrow(cardata90)),
                rep("blue", nrow(bloodfat)))
plot(Result[, 1:2],
     col = plotColors,
     xlab = "distance to cardata90", ylab = "distance to bloodfat",
     main = "distspace representation of cardata90 and the bloodfat data.")


# By default the bagdistance is used to transform the data. 
# This can be changed by using the type argument. Additional option to be
# passed to the underlying function calculatin the distance may be passed in 
# the option argument.
options <- list(type = "Affine", ndir = 1000, seed = 3)
Result <- distSpace(trainingData = trainingData,
                    type = "adjOutl",
                    options = options)
# Plot the results
plotColors <- c(rep("orange", nrow(cardata90)),
                rep("blue", nrow(bloodfat)))
plot(Result[, 1:2],
     col = plotColors,
     xlab = "distance to cardata90", ylab = "distance to bloodfat",
     main = "distspace representation of cardata90 and the bloodfat data.")

data(octane)
data(glass)
trainingData <- list(set1 = glass[1:100,, , drop = FALSE],
                     set2 = octane[1:100,, , drop = FALSE])
# Transform the data into distspace
Result <- distSpace(trainingData = trainingData, type = "fAO")

# }

Run the code above in your browser using DataLab