Learn R Programming

feature (version 1.2.7)

featureSignif: Feature significance for kernel density estimation

Description

Identify significant features of kernel density estimates of 1- to 4-dimensional data.

Usage

featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE,
   addSignifCurv=TRUE, signifLevel=0.05)

Arguments

x
data matrix
bw
vector of bandwidth(s)
gridsize
vector of estimation grid sizes
scaleData
flag for scaling the data i.e. transforming to unit variance for each dimension.
addSignifGrad
flag for computing significant gradient regions
addSignifCurv
flag for computing significant curvature regions
signifLevel
significance level

Value

  • Returns an object of class fs which is a list with the following fields [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Details

Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).

The test statistic for gradient testing is at a point $\mathbf{x}$ is $$W(\mathbf{x}) = \Vert \widehat{\nabla f} (\mathbf{x}; \mathbf{H}) \Vert^2$$ where $\widehat{\nabla f} (\mathbf{x};\mathbf{H})$ is kernel estimate of the gradient of $f(\mathbf{x})$ with bandwidth $\mathbf{H}$, and $\Vert\cdot\Vert$ is the Euclidean norm. $W(\mathbf{x})$ is approximately chi-squared distributed with $d$ degrees of freedom where $d$ is the dimension of the data.

The analogous test statistic for curvature is $$W^{(2)}(\mathbf{x}) = \Vert \mathrm{vech} \widehat{\nabla^{(2)}f} (\mathbf{x}; \mathbf{H})\Vert ^2$$ where $\widehat{\nabla^{(2)} f} (\mathbf{x};\mathbf{H})$ is the kernel estimate of the curvature of $f(\mathbf{x})$, and vech is the vector-half operator. $W^{(2)}(\mathbf{x})$ is approximately chi-squared distributed with $d(d+1)/2$ degrees of freedom.

Since this is a situation with many dependent hypothesis tests, we use a multiple comparison or simultaneous test to control the overall level of significance. We use a Hochberg-type procedure. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).

References

Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.

Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.

Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802. Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.

Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.

See Also

featureSignifGUI, plot.fs

Examples

Run this code
## Univariate example
data(earthquake)
eq3 <- -log10(-earthquake[,3])
fs <- featureSignif(eq3, bw=0.1)
plot(fs, addSignifGradRegion=TRUE)

## Bivariate example
library(MASS)
data(geyser)
fs <- featureSignif(geyser)
plot(fs, addSignifCurvRegion=TRUE)

## Trivariate example
data(earthquake)
earthquake[,3] <- -log10(-earthquake[,3])
fs <- featureSignif(earthquake, scaleData=TRUE, bw=c(0.06, 0.06, 0.05))
plot(fs, addKDE=TRUE)
plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)

Run the code above in your browser using DataLab