PPCI (version 0.1.5)

mddr: Minimum Density Dimension Reduction

Description

Finds a linear projection of a data set using projection pursuit to find the vector(s) orthogonal to minimum density hyperplanes.

Usage

mddr(X, p, minsize, v0, bandwidth,
      alphamin, alphamax, verb, labels, maxit, ftol)

Arguments

X

a numeric matrix (num_data x num_dimensions); the dataset.

p

an integer; the number of dimensions in the projection.

v0

(optional) initial projection direction(s). a function(X) of the data, which returns a matrix with ncol(X) rows. each column of the output of v0(X) is used as an initialisation for projection pursuit. the solution with the minimum normalised cut is used within the final model. if omitted then a single initialisation is used for each column of the projection matrix; the first principal component within the null space of the other columns.

bandwidth

(optional) used to compute the bandwidth parameter (h) for the kernel density estimator. a numeric valued function(X) of the cluster being split. if omitted then bandwidth(X) = 0.9*eigen(cov(X))$values[1]^.5*nrow(X)^(-0.2).

alphamin

(optional) initial (scaled) bound on the distance of the optimal hyperplane from the mean of the data. if omitted then alphamin = 0.

alphamax

(optional) maximum (scaled) distance of the optimal hyperplane from the mean of the data. if omitted then alphamax = 1.

verb

(optional) verbosity level of optimisation procedure. verb==0 produces no output. verb==1 produces plots illustrating the progress of projection pursuit via plots of the projected data. verb==2 adds to these plots additional information about the progress. verb==3 creates a folder in working directory and stores all plots for verb==2. if omitted then verb==0.

labels

(optional) vector of class labels. not used in the actual projection pursuit. only used for illustrative purposes for values of verb>0.

maxit

(optional) maximum number of iterations in optimisation for each value of alpha. if omitted then maxit=15.

ftol

(optional) tolerance level for convergence of optimisation, based on relative function value improvements. if omitted then ftol = 1e-5.

minsize

(optional) the minimum number of data on each side of a hyperplane. if omitted then minsize = 1.

Value

a named list with class ppci_projection_solution with the following components

$projection

the num_dimensions x p projection matrix.

$fitted

the num_data x p projected data set.

$data

the input data matrix.

$method

=="MDH".

$params

list of parameters used to find $projection.

References

Pavlidis N.G., Hofmeyr D.P., Tasoulis S.K. (2016) Minimum Density Hyperplanes. Journal of Machine Learning Research, 17(156), 1--33.

Examples

Run this code
# NOT RUN {
### not run
run = FALSE
if(run){
  ## load optidigits dataset
  data(optidigits)

  ## find nine dimensional projection (one fewer than
  ## the number of clusters, as is common in clustering)
  sol <- mddr(optidigits$x, 9)

  ## visualise the solution via the first 3 pairs of dimensions
  plot(sol, pairs = 3, labels = optidigits$c)

  ## compare with PCA projection
  pairs(optidigits$x%*%eigen(cov(optidigits$x))$vectors[,1:3], col = optidigits$c)
  }
# }

Run the code above in your browser using DataCamp Workspace