Computes nonparametric p-values for the potential class memberships of new observations. The p-values are based on 'weighted nearest-neighbors'.
pvs.wnn(NewX, X, Y, wtype = c('linear', 'exponential'), W = NULL,
tau = 0.3, distance = c('euclidean', 'ddeuclidean',
'mahalanobis'), cova = c('standard', 'M', 'sym'))PV is a matrix containing the p-values. Precisely, for each new observation NewX[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that \(Y[i] = b\).
If tau is a vector or NULL (and W = NULL), PV has an attribute "opt.tau", which is a matrix and opt.tau[i,b] is the best tau for observation NewX[i,] and class b (see section 'Details'). opt.tau[i,b] is used to compute the p-value for observation NewX[i,] and class b.
data matrix consisting of one or several new observations (row vectors) to be classified.
matrix containing training observations, where each observation is a row vector.
vector indicating the classes which the training observations belong to.
type of the weight function (see section 'Details' below).
vector of the (decreasing) weights (see section 'Details' below).
parameter of the weight function. If tau is a vector or tau = NULL, the program searches for the best tau. For more information see section 'Details'.
the distance measure:
'euclidean': fixed Euclidean distance,
'ddeuclidean': data driven Euclidean distance (component-wise standardization),
'mahalanobis': Mahalanobis distance.
estimator for the covariance matrix:
'standard': standard estimator,
'M': M-estimator,
'sym': symmetrized M-estimator.
Niki Zumbrunnen niki.zumbrunnen@gmail.com
Lutz Dümbgen lutz.duembgen@stat.unibe.ch
https://www.imsv.unibe.ch/about_us/staff/prof_dr_duembgen_lutz/index_eng.html
Computes nonparametric p-values for the potential class memberships of new observations. Precisely, for each new observation NewX[i,] and each class b the number PV[i,b] is a p-value for the null hypothesis that \(Y[i] = b\).
This p-value is based on a permutation test applied to an estimated Bayesian likelihood ratio, using 'weighted nearest neighbors' with estimated prior probabilities \(N(b)/n\). Here \(N(b)\) is the number of observations of class \(b\) and \(n\) is the total number of observations.
The (decreasing) weights for the observation can be either indicated with a \(n\) dimensional vector W or (if W = NULL) one of the following weight functions can be used:
linear: $$W_i = \max(1-\frac{i}{n}/\tau,0),$$
exponential: $$W_i = (1-\frac{i}{n})^\tau.$$
If tau is a vector, the program searches for the best tau. To determine the best tau for the p-value PV[i,b], the new observation NewX[i,] is added to the training data with class label b and then for all training observations with Y[j] != b the sum of the weights of the observations belonging to class b is computed. Then the tau which minimizes the sum of these values is chosen.
If tau = NULL, it is set to seq(0.1,0.9,0.1) if wtype = "l" and to c(1,5,10,20) if wtype = "e".
Zumbrunnen N. and Dümbgen L. (2017) pvclass: An R Package for p Values for Classification. Journal of Statistical Software 78(4), 1--19. doi:10.18637/jss.v078.i04
Dümbgen L., Igl B.-W. and Munk A. (2008) P-Values for Classification. Electronic Journal of Statistics 2, 468--493, available at tools:::Rd_expr_doi("10.1214/08-EJS245").
Zumbrunnen N. (2014) P-Values for Classification – Computational Aspects and Asymptotics. Ph.D. thesis, University of Bern, available at http://boris.unibe.ch/id/eprint/53585.
pvs, pvs.gaussian, pvs.knn, pvs.logreg
X <- iris[c(1:49, 51:99, 101:149), 1:4]
Y <- iris[c(1:49, 51:99, 101:149), 5]
NewX <- iris[c(50, 100, 150), 1:4]
pvs.wnn(NewX, X, Y, wtype = 'l', tau = 0.5)
Run the code above in your browser using DataLab