Learn R Programming

ks (version 1.10.4)

kde.test: Kernel density based global two-sample comparison test

Description

Kernel density based global two-sample comparison test for 1- to 6-dimensional data.

Usage

kde.test(x1, x2, H1, H2, h1, h2, psi1, psi2, var.fhat1, var.fhat2, binned=FALSE, bgridsize, verbose=FALSE, pilot="dscalar")

Arguments

x1,x2
vector/matrix of data values
H1,H2,h1,h2
bandwidth matrices/scalar bandwidths. If these are missing, Hpi.kfe, hpi.kfe is called by default.
psi1,psi2
zero-th order kernel functional estimates
var.fhat1,var.fhat2
sample variance of KDE estimates evaluated at x1, x2
binned
flag for binned estimation. Default is FALSE.
bgridsize
vector of binning grid sizes
verbose
flag to print out progress information. Default is FALSE.
pilot
"dscalar" = single pilot bandwidth (default) "dunconstr" = single unconstrained pilot bandwidth

Value

A kernel two-sample global significance test is a list with fields:
Tstat
T statistic
zstat
z statistic - normalised version of Tstat
pvalue
p-value of the double sided test
mean,var
mean and variance of null distribution
var.fhat1,var.fhat2
sample variances of KDE values evaluated at data points
n1,n2
sample sizes
H1,H2
bandwidth matrices
psi1,psi12,psi21,psi2
kernel functional estimates

Details

The null hypothesis is $H_0: f_1 = f_2$ where $f_1, f_2$ are the respective density functions. The measure of discrepancy is the integrated squared error (ISE) $int [ f_1(x) - f_2(x)]^2 dx$. If we rewrite this as $T = psi_0,1 - psi_0,12 - psi_0,21 + psi_0,2$ where $psi_0,uv = int f_u(x) f_v(x) dx$, then we can use kernel functional estimators. This test statistic has a null distribution which is asymptotically normal, so no bootstrap resampling is required to compute an approximate p-value. If H1,H2 are missing then the plug-in selector Hpi.kfe is automatically called by kde.test to estimate the functionals with kfe(, deriv.order=0). Likewise for missing h1,h2.

As of ks 1.8.8, kde.test(,binned=TRUE) invokes binned estimation for the computation of the bandwidth selectors, and not the test statistic and p-value.

References

Duong, T., Goud, B. & Schauer, K. (2012) Closed-form density-based framework for automatic detection of cellular morphology changes. PNAS, 109, 8382-8387.

See Also

kde.local.test

Examples

Run this code
set.seed(8192)
samp <- 1000
x <- rnorm.mixt(n=samp, mus=0, sigmas=1, props=1)
y <- rnorm.mixt(n=samp, mus=0, sigmas=1, props=1)
kde.test(x1=x, x2=y)$pvalue   ## accept H0: f1=f2

library(MASS)
data(crabs)
x1 <- crabs[crabs$sp=="B", c(4,6)]
x2 <- crabs[crabs$sp=="O", c(4,6)]
kde.test(x1=x1, x2=x2)$pvalue  ## reject H0: f1=f2

Run the code above in your browser using DataLab