Kernel feature signficance for 1- to 6-dimensional data.

```
kfs(x, H, h, deriv.order=2, gridsize, gridtype, xmin, xmax, supp=3.7,
eval.points, binned=FALSE, bgridsize, positive=FALSE, adj.positive, w,
verbose=FALSE, signif.level=0.05)
```

x

matrix of data values

H,h

bandwidth matrix/scalar bandwidth. If these are missing, `Hpi`

or `hpi`

is called by default.

deriv.order

derivative order (scalar)

gridsize

vector of number of grid points

gridtype

not yet implemented

xmin,xmax

vector of minimum/maximum values for grid

supp

effective support for standard normal

eval.points

vector or matrix of points at which estimate is evaluated

binned

flag for binned estimation. Default is FALSE.

bgridsize

vector of binning grid sizes

positive

flag if 1-d data are positive. Default is FALSE.

adj.positive

adjustment applied to positive 1-d data

w

vector of weights. Default is a vector of all ones.

verbose

flag to print out progress information. Default is FALSE.

signif.level

overall level of significance for hypothesis tests. Default is 0.05.

A kernel feature significance estimate is an object of class
`kfs`

which is a list with fields

data points - same as input

vector or list of points at which the estimate is evaluated

binary matrix for significant feature at
`eval.points`

: 0 = not signif., 1 = signif.

scalar bandwidth (1-d only)

bandwidth matrix

"linear"

flag for estimation on a grid

flag for binned estimation

variable names

weights

derivative order (scalar)

each row is a vector of partial derivative indices.

This is the same structure as a kdde object, except that estimate is a binary matrix rather than real-valued.

Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. Only the latter is currently implemented, and is also known as significant modal regions.

The hypothesis test at a grid point \(\bold{x}\) is
\(H_0(\bold{x}): \mathsf{H} f(\bold{x}) < 0\),
i.e. the density Hessian matrix \(\mathsf{H} f(\bold{x})\) is negative definite.
The \(p\)-values are computed for each \(\bold{x}\) using that
the test statistic is
approximately chi-squared distributed with \(d(d+1)/2\) d.f.
We then use a Hochberg-type simultaneous testing procedure, based on the
ordered \(p\)-values, to control the
overall level of significance to be `signif.level`

. If
\(H_0(\bold{x})\) is rejected then \(\bold{x}\)
belongs to a significant modal region.

The computations are based on `kdde(x, deriv.order=2)`

so
`kfs`

inherits its behaviour from `kdde`

.
If the bandwidth `H`

is missing from `kfs`

, then
the default bandwidth is the plug-in selector
`Hpi(,deriv.order=2)`

. Likewise for missing `h`

.
The effective support, binning, grid size, grid range, positive
parameters are the same as `kde`

.

This function is similar to the `featureSignif`

function in the
feature package, except that it accepts unconstrained bandwidth
matrices.

Chaudhuri, P. & Marron, J.S. (1999)
SiZer for exploration of structures in curves.
*Journal of the American Statistical Association*,
**94**, 807-823.

Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008)
Feature significance for multivariate kernel density estimation.
*Computational Statistics and Data Analysis*, **52**,
4225-4242.

Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002)
Significance in scale space for bivariate density estimation.
*Journal of Computational and Graphical Statistics*,
**11**, 1-22.

```
# NOT RUN {
## see example is ? plot.fks
# }
```

Run the code above in your browser using DataCamp Workspace