kda: Kernel discriminant analysis

Description

Kernel discriminant analysis for 1- to 6-dimensional data.

Usage

kda(x, x.group, Hs, hs, prior.prob=NULL, gridsize, xmin, xmax, supp=3.7,
  eval.points, binned=FALSE, bgridsize, w, compute.cont=FALSE, approx.cont=TRUE,
  kde.flag=TRUE)
Hkda(x, x.group, Hstart, bw="plugin", ...)
Hkda.diag(x, x.group, bw="plugin", ...)
hkda(x, x.group, bw="plugin", ...)
compare(x.group, est.group, by.group=FALSE)
compare.kda.cv(x, x.group, bw="plugin", prior.prob=NULL, Hstart, by.group=FALSE,
   verbose=FALSE, recompute=FALSE, ...)
compare.kda.diag.cv(x, x.group, bw="plugin", prior.prob=NULL, by.group=FALSE, 
   verbose=FALSE, recompute=FALSE, ...)

Arguments

matrix of training data values

x.group

vector of group labels for training data

Hs,hs

(stacked) matrix of bandwidth matrices/vector of scalar bandwidths. If these are missing, Hkda or hkda is called by default.

prior.prob

vector of prior probabilities

gridsize

vector of grid sizes

xmin,xmax

vector of minimum/maximum values for grid

supp

effective support for standard normal

eval.points

points at which estimate is evaluated

binned

flag for binned estimation. Default is FALSE.

bgridsize

vector of binning grid sizes

vector of weights. Not yet implemented.

compute.cont

flag for computing 1% to 99% probability contour levels. Default is FALSE.

approx.cont

flag for computing approximate probability contour levels. Default is TRUE.

kde.flag

flag for computing KDE on grid. Default is TRUE.

bandwidth: "plugin" = plug-in, "lscv" = LSCV, "scv" = SCV

Hstart

(stacked) matrix of initial bandwidth matrices, used in numerical optimisation

est.group

vector of estimated group labels

by.group

flag to give results also within each group

verbose

flag for printing progress information. Default is FALSE.

recompute

flag for recomputing the bandwidth matrix after excluding the i-th data item

...

other optional parameters for bandwidth selection, see Hpi, Hlscv, Hscv

Value

--A kernel discriminant analysis is an object of class kda which is a list with fields
xlist of data points, one for each group label
estimatelist of density estimates at eval.points, one for each group label
eval.pointspoints that the estimate is evaluated at, one for each group label
hvector of bandwidths (1-d only)
Hstacked matrix of bandwidth matrices or vector of bandwidths
griddedflag for estimation on a grid
binnedflag for binned estimation
wweights
prior.probprior probabilities
x.groupgroup labels - same as input
x.group.estimateestimated group labels. If the test data eval.points are given then these are classified. Otherwise the training data x are classified.
--The result from Hkda and Hkda.diag is a stacked matrix of bandwidth matrices, one for each training data group. The result from hkda is a vector of bandwidths, one for each training data group.
--The compare functions create a comparison between the true group labels x.group and the estimated ones. It returns a list with fields
crosscross-classification table with the rows indicating the true group and the columns the estimated group
errormisclassification rate (MR)
In the case where we have test data that is independent of the training data, compare computes MR = (number of points wrongly classified)/(total number of points). In the case where we don't have independent test data e.g. we are classifying the training data set itself, then the cross validated estimate of MR is more appropriate. These are implemented as compare.kda.cv (full bandwidth selectors) and compare.kda.diag.cv (for diagonal bandwidth selectors). These functions are only available for d > 1.
If by.group=FALSE then only the total MR rate is given. If it is set to TRUE, then the MR rates for each class are also given (estimated number in group divided by true number).

Details

If the bandwidths Hs are missing from kda, then the default bandwidths are the plug-in selectors Hkda(, bw="plugin"). Likewise for missing hs. Valid options for bw are "plugin", "lscv" and "scv" which in turn call Hpi, Hlscv and Hscv.

The effective support, binning, grid size, grid range, positive data parameters are the same as for kde. If prior probabilities are known then set prior.prob to these. Otherwise prior.prob=NULL uses the sample proportions as estimates of the prior probabilities. As of ks 1.8.11, kda.kde has been subsumed into kda, so all prior calls to kda.kde can be replaced by kda. To reproduce the previous behaviour of kda, the command is kda(, kde.flag=FALSE).

References

Simonoff, J. S. (1996) Smoothing Methods in Statistics. Springer-Verlag. New York

Examples

Run this code

x <- c(rnorm.mixt(n=100, mus=1), rnorm.mixt(n=100, mus=-1))
x.gr <- rep(c(1,2), times=c(100,100))
y <- c(rnorm.mixt(n=100, mus=1), rnorm.mixt(n=100, mus=-1))
kda.gr <- kda(x, x.gr, eval.points=y)
compare(kda.gr$x.group, kda.gr$x.group.est, by.group=TRUE)

## See other examples in ? plot.kda

Run the code above in your browser using DataLab