Learn R Programming

Rdimtools (version 0.3.2)

do.rndproj: Random Projection

Description

do.rndproj is a linear dimensionality reduction method based on random projection technique, featured by the celebrated Johnson<U+2013>Lindenstrauss lemma.

Usage

do.rndproj(X, ndim = 2, preprocess = c("null", "center", "scale", "cscale",
  "whiten", "decorrelate"), type = c("gaussian", "achlioptas", "sparse"),
  s = max(sqrt(ncol(X)), 3))

Arguments

X

an \((n\times p)\) matrix or data frame whose rows are observations and columns represent independent variables.

ndim

an integer-valued target dimension.

preprocess

an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

type

a type of random projection, one of "gaussian","achlioptas" or "sparse".

s

a tuning parameter for determining values in projection matrix. While default is to use \(max(log \sqrt{p},3)\), it is required for \(s \ge 3\).

Value

a named list containing

Y

an \((n\times ndim)\) matrix whose rows are embedded observations.

projection

a \((p\times ndim)\) whose columns are basis for projection.

epsilon

an estimated error \(\epsilon\) in accordance with JL lemma.

trfinfo

a list containing information for out-of-sample prediction.

Details

The Johnson-Lindenstrauss(JL) lemma states that given \(0 < \epsilon < 1\), for a set \(X\) of \(m\) points in \(R^N\) and a number \(n > 8log(m)/\epsilon^2\), there is a linear map \(f:R^N\) to R^n such that $$(1-\epsilon)|u-v|^2 \le |f(u)-f(v)|^2 \le (1+\epsilon)|u-v|^2$$ for all \(u,v\) in \(X\).

Three types of random projections are supported for an (p-by-ndim) projection matrix \(R\).

  1. Conventional approach is to use normalized Gaussian random vectors sampled from unit sphere \(S^{p-1}\).

  2. Achlioptas suggested to employ a sparse approach using samples from \(\sqrt{3}(1,0,-1)\) with probability \((1/6,4/6,1/6)\).

  3. Li et al proposed to sample from \(\sqrt{s}(1,0,-1)\) with probability \((1/2s,1-1/s,1/2s)\) for \(s\ge 3\) to incorporate sparsity while attaining speedup with little loss in accuracy. While the original suggsetion from the authors is to use \(\sqrt{p}\) or \(p/log(p)\) for \(s\), any user-supported \(s \ge 3\) is allowed.

References

beals_extensions_1984Rdimtools

achlioptas_database-friendly_2003Rdimtools

li_very_2006Rdimtools

Examples

Run this code
# NOT RUN {
## generate Swiss Roll data of 2,000 data points.
X <- aux.gensamples(n=2000)

## 1. Gaussian projection
output1 <- do.rndproj(X,ndim=2)

## 2. Achlioptas projection
output2 <- do.rndproj(X,ndim=2,type="achlioptas")

## 3. Sparse projection
output3 <- do.rndproj(X,type="sparse",s=5)        ## fulfill condition on s

## Visualize three different projections
par(mfrow=c(1,3))
plot(output1$Y[,1],output1$Y[,2],main="Gaussian")
plot(output2$Y[,1],output2$Y[,2],main="Arclioptas")
plot(output3$Y[,1],output3$Y[,2],main="Sparse")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab