kcde: Kernel cumulative distribution/survival function estimate for multivariate data

Description

Kernel cumulative distribution/survival function estimate for 1- to 3-dimensional data.

Usage

kcde(x, H, h, gridsize, gridtype, xmin, xmax, supp=3.7, eval.points,
  binned=FALSE, bgridsize, positive=FALSE, adj.positive, w, verbose=FALSE,
  tail.flag="lower.tail")
Hpi.kcde(x, nstage=2, pilot="dunconstr", Hstart, binned=FALSE, bgridsize,
  amise=FALSE, verbose=FALSE, optim.fun="nlm")
hpi.kcde(x, nstage=2, binned=TRUE)

Arguments

matrix of data values

H,h

bandwidth matrix/scalar bandwidth. If these are missing, then Hpi.kcde or hpi.kcde is called by default.

gridsize

vector of number of grid points

gridtype

not yet implemented

xmin,xmax

vector of minimum/maximum values for grid

supp

effective support for standard normal

eval.points

points at which estimate is evaluated

binned

flag for binned estimation. Default is FALSE.

bgridsize

vector of binning grid sizes

positive

flag if 1-d data are positive. Default is FALSE.

adj.positive

adjustment applied to positive 1-d data

not yet implemented

verbose

flag to print out progress information. Default is FALSE.

tail.flag

"lower.tail" = cumulative distribution, "upper.tail" = survival function

nstage

number of stages in the plug-in bandwidth selector (1 or 2)

pilot

"dscalar" = single pilot bandwidth "dunconstr" = single unconstrained pilot bandwidth

Hstart

initial bandwidth matrix, used in numerical optimisation

amise

flag to return the minimal scaled PI value

optim.fun

optimiser function: one of nlm or optim

Value

A kernel cumulative distribution estimate is an object of class kcde which is a list with fields:
xdata points - same as input
eval.pointspoints at which the estimate is evaluated
estimatecumulative distribution/survival function estimate at eval.points
hscalar bandwidth (1-d only)
Hbandwidth matrix
gridtype"linear"
griddedflag for estimation on a grid
binnedflag for binned estimation
namesvariable names
wweights
tail"lower.tail"=cumulative distribution, "upper.tail"=survival function

Details

If tail.flag="lower.tail" then the cumulative distribution function $\mathrm{Pr}(\bold{X}\leq\bold{x})$ is estimated, otherwise if tail.flag="upper.tail", it is the survival function $\mathrm{Pr}(\bold{X}>\bold{x})$. For d>1, $\mathrm{Pr}(\bold{X}\leq\bold{x}) \neq 1 - \mathrm{Pr}(\bold{X}>\bold{x})$. If the bandwidth H is missing from kcde, then the default bandwidth is the binned 2-stage plug-in selector Hpi.kcde(, nstage=2, binned=TRUE). Likewise for missing h. These bandwidth selectors are optimal for cumulative distribution/survival functions, see Duong (2013).

Binning/exact estimation and positive 1-d data behaviour is the same as for kde. No pre-scaling/pre-sphering is used since the bandwidth selectors Hpi.kcde are not invariant to translation/dilation.

References

Duong, T. (2013) Non-parametric kernel estimation of multivariate cumulative distribution functions and receiver operating characteristic curves. Submitted.

Examples

Run this code

library(MASS)
data(iris)
Fhat <- kcde(iris[,1:2])  

## See other examples in ? plot.kcde

Run the code above in your browser using DataLab