pvs: P-Values to Classify New Observations

Description

Computes nonparametric p-values for the potential class memberships of new observations.

Usage

pvs(NewX, X, Y, method = c('gaussian', 'knn', 'wnn', 'logreg'), ...)

Value

PV is a matrix containing the p-values. Precisely, for each new observation NewX[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that \(Y[i] = b\).

Arguments

NewX: data matrix consisting of one or several new observations (row vectors) to be classified.
X: matrix containing training observations, where each observation is a row vector.
Y: vector indicating the classes which the training observations belong to.
method: one of the following methods:
'gaussian': plug-in statistic for the standard Gaussian model,
'knn': k nearest neighbors,
'wnn': weighted nearest neighbors,
'logreg': multicategory logistic regression with \(l1\)-penalization.
...: further arguments depending on the method (see pvs.gaussian, pvs.knn, pvs.wnn, pvs.logreg).

Author

Niki Zumbrunnen niki.zumbrunnen@gmail.com
Lutz Dümbgen lutz.duembgen@stat.unibe.ch
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html

Details

Computes nonparametric p-values for the potential class memberships of new observations. Precisely, for each new observation NewX[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that \(Y[i] = b\).
This p-value is based on a permutation test applied to an estimated Bayesian likelihood ratio, using a plug-in statistic for the Gaussian model, 'k nearest neighbors', 'weighted nearest neighbors' or multicategory logistic regression with \(l1\)-penalization (see pvs.gaussian, pvs.knn, pvs.wnn, pvs.logreg) with estimated prior probabilities \(N(b)/n\). Here \(N(b)\) is the number of observations of class \(b\) and \(n\) is the total number of observations.

References

Zumbrunnen N. and Dümbgen L. (2017) pvclass: An R Package for p Values for Classification. Journal of Statistical Software 78(4), 1--19. doi:10.18637/jss.v078.i04

Dümbgen L., Igl B.-W. and Munk A. (2008) P-Values for Classification. Electronic Journal of Statistics 2, 468--493, available at tools:::Rd_expr_doi("10.1214/08-EJS245").

Zumbrunnen N. (2014) P-Values for Classification – Computational Aspects and Asymptotics. Ph.D. thesis, University of Bern, available at http://boris.unibe.ch/id/eprint/53585.

Examples

Run this code

X <- iris[c(1:49, 51:99, 101:149), 1:4]
Y <- iris[c(1:49, 51:99, 101:149), 5]
NewX <- iris[c(50, 100, 150), 1:4]

pvs(NewX, X, Y, method = 'k', k = 10)

Run the code above in your browser using DataLab