do.sne: Stochastic Neighbor Embedding

Description

Stochastic Neighbor Embedding (SNE) is a probabilistic approach to mimick distributional description in high-dimensional - possible, nonlinear - subspace on low-dimensional target space. do.sne fully adopts algorithm details in an original paper by Hinton and Roweis (2002).

Usage

do.sne(
  X,
  ndim = 2,
  perplexity = 30,
  eta = 0.05,
  maxiter = 2000,
  jitter = 0.3,
  jitterdecay = 0.99,
  momentum = 0.5,
  preprocess = c("null", "center", "scale", "scale", "decorrelate", "whiten"),
  pca = TRUE,
  pcaratio = 0.9,
  pcascale = FALSE,
  symmetric = FALSE
)

Arguments

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension.

perplexity

desired level of perplexity; ranging [5,50].

eta

learning parameter.

maxiter

maximum number of iterations.

jitter

level of white noise added at the beginning.

jitterdecay

decay parameter in \((0,1)\). The closer to 0, the faster artificial noise decays.

momentum

level of acceleration in learning.

preprocess

an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

pca

whether to use PCA as preliminary step; TRUE for using it, FALSE otherwise.

pcaratio

proportion of variances explained in finding PCA preconditioning. See also do.pca for more details.

pcascale

a logical; FALSE for using Covariance, TRUE for using Correlation matrix. See also do.pca for more details.

symmetric

a logical; FALSE to solve it naively, and TRUE to adopt symmetrization scheme.

Value

a named list containing

Y: an \((n\times ndim)\) matrix whose rows are embedded observations.
trfinfo: a list containing information for out-of-sample prediction.
vars: a vector containing betas used in perplexity matching.

References

hinton_stochastic_2003Rdimtools

Examples

Run this code

# NOT RUN {
## generate ribbon-shaped data
## in order to pass CRAN pretest, n is set to be small.
X = aux.gensamples(dname="ribbon",n=99)

## 1. pca scaling with 90% variances explained
output1 <- do.sne(X,ndim=2,pca=TRUE)

## 2. pca scaling wiht 50% variances explained
output2 <- do.sne(X,ndim=2,pca=TRUE,pcaratio=0.50)

## 3. Setting 2 + smaller level of perplexity
output3 <- do.sne(X,ndim=2,pca=TRUE,pcaratio=0.50,perplexity=10)

## Visualize three different projections
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,3))
if ((length(output1)!=1)&&(!is.na(output1))){plot(output1$Y[,1],output1$Y[,2],main="Setting 1")}
if ((length(output1)!=1)&&(!is.na(output2))){plot(output2$Y[,1],output2$Y[,2],main="Setting 2")}
if ((length(output1)!=1)&&(!is.na(output3))){plot(output3$Y[,1],output3$Y[,2],main="Setting 3")}
par(opar)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab