Learn R Programming

ks (version 1.8.11)

kde.test: Kernel density based two-sample comparison test

Description

Kernel density based two-sample comparison test for 1- to 6-dimensional data.

Usage

kde.test(x1, x2, H1, H2, h1, h2, psi1, psi2, var.fhat1, var.fhat2, 
    binned=FALSE, bgridsize, verbose=FALSE, pilot="dscalar")
Hpi.kfe(x, nstage=2, pilot="dscalar", pre="sphere", Hstart, binned=FALSE, 
    bgridsize, amise=FALSE, deriv.order=0, verbose=FALSE, optim.fun="nlm")
hpi.kfe(x, nstage=2, binned=FALSE, bgridsize, amise=FALSE, deriv.order=0)

Arguments

x,x1,x2
vector/matrix of data values
H1,H2,h1,h2
bandwidth matrices/scalar bandwidths. If these are missing, Hpi.kfe or hpi.kfe is called by default.
psi1,psi2
zero-th order kernel functional estimates
var.fhat1,var.fhat2
sample variance of KDE estimates evaluated at x1, x2
binned
flag for binned estimation. Default is FALSE.
bgridsize
vector of binning grid sizes
verbose
flag to print out progress information. Default is FALSE.
nstage
number of stages in the plug-in bandwidth selector (1 or 2)
pilot
"dscalar" = single pilot bandwidth "dunconstr" = single unconstrained pilot bandwidth
pre
"scale" = pre.scale, "sphere" = pre.sphere
Hstart
initial bandwidth matrix, used in numerical optimisation
amise
flag to return the minimal scaled PI value
deriv.order
derivative order of kfe (kernel functional estimate). Only deriv.order=0 is currently implemented.
optim.fun
optimiser function: one of nlm or optim.

Value

  • A list with fields
  • TstatT statistic
  • zstatz statistic - normalised version of Tstat
  • pvaluep-value of the double sided test
  • mean,varmean and variance of null distribution
  • var.fhat1,var.fhat2sample variances of KDE values evaluated at data points
  • n1,n2sample sizes
  • H1,H2bandwidth matrices
  • psi1,psi12,psi21,psi2kernel functional estimates

Details

--The null hypothesis is $H_0: f_1 \equiv f_2$ where $f_1, f_2$ are the respective density functions. The measure of discrepancy is the integrated $L_2$ error (ISE) $T = \int [f_1(\bold{x}) - f_2(\bold{x})]^2 \, d \bold{x}$. If we rewrite this as $T = \psi_1 - \psi_{12} - \psi_{21} + \psi_2$ where $\psi_{uv} = \int f_u (\bold{x}) f_v (\bold{x}) \, d \bold{x}$, then we can use kernel functional estimators. Duong et al. (2012) show that this test statistic has a null distribution which is asymptotically normal, so no bootstrap resampling is required to compute an approximate p-value. As of ks 1.8.8, kde.test(,binned=TRUE) invokes binned estimation for the computation of the bandwidth selectors, and not the test statistic and p-value.

--Hpi.kfe is the optimal plug-in bandwidth for $r$-th order kernel functional estimator based on the unconstrained pilot selectors of Chacon & Duong (2010). This is automatically called by kde.test to estimate the $\psi$ functionals with $r=0$. hpi.kfe is the 1-d equivalent, using the formulas from Wand & Jones (1995, p.70).

References

Chacon, J.E. & Duong, T. (2010) Multivariate plug-in bandwidth selection with unconstrained pilot matrices. Test, 19, 375-398.

Duong, T., Goud, B & Schauer, K. (2012) Closed-form density-based framework for automatic detection of cellular morphology changes. PNAS, 109, 8382-8387.

Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.

See Also

kde.local.test

Examples

Run this code
## univariate example
set.seed(8192)
samp <- 1000
x <- rnorm.mixt(n=samp, mus=0, sigmas=1, props=1)
y <- rnorm.mixt(n=samp, mus=0.25, sigmas=1, props=1)
kde.test(x1=x, x2=y)$pvalue   ## reject H0: f1=f2


## bivariate example
mus1 <- rbind(c(1,-1), c(-1,1))
Sigmas1 <- rbind(invvech(c(4/9, 4/15, 4/9)), invvech(c(4/9, 4/15, 4/9)))
props1 <- c(1,1)/2
mus2 <- rbind(c(1,-1), c(-1,1))
Sigmas2 <- rbind(invvech(c(4/9, 14/45, 4/9)), 4/9*diag(2))
props2 <- c(1,1)/2

set.seed(8192)
samp <- 1000
x <- rmvnorm.mixt(n=samp, mus=mus1, Sigmas=Sigmas1, props=props1)
y <- rmvnorm.mixt(n=samp, mus=mus2, Sigmas=Sigmas2, props=props2)
kde.test(x1=x, x2=y, binned=TRUE)$pvalue    ## reject H0: f1=f2

Run the code above in your browser using DataLab