Learn R Programming

bcROCsurface (version 1.0-6)

cv_knn: Cross-validation for K nearest-neighbor regression

Description

This function calculates the estimated cross-validation prediction error for K nearest-neighbor regression and returns a suitable choice for K.

Usage

cv_knn(x_mat, dise_vec, veri_stat, k_list = NULL, type = "eucli", plot = FALSE)

Value

A suitable choice for k is returned.

Arguments

x_mat

a numeric design matrix, which used in rho_knn to estimate probabilities of the disease status.

dise_vec

a n * 3 binary matrix with three columns, corresponding to the three classes of the disease status. In row i, 1 in column j indicates that the i-th subject belongs to class j, with j = 1, 2, 3. A row of NA values indicates a non-verified subject.

veri_stat

a binary vector containing the verification status (1 verified, 0 not verified).

k_list

a list of candidate values for K. If NULL(the default), the set \(\{1, 2, ..., n.ver\}\) is employed, where, \(n.ver\) is the number of verified subjects.

type

a type of distance, see rho_knn for more details. Default "eucli".

plot

if TRUE, a plot of cross-validation prediction error is produced.

Details

Data are divided into two groups, the first contains the data corresponding to veri_stat = 1, whereas the second contains the data corresponding to veri_stat = 0. In the first group, the discrepancy between the true disease status and the KNN estimates of the probabilities of the disease status is computed by varying k from 1 to the number of verification subjects, see To Duc et al. (2020). The optimal value of k is the value that corresponds to the smallest value of the discrepancy.

References

To Duc, K., Chiogna, M. and Adimari, G. (2020) Nonparametric estimation of ROC surfaces in presence of verification bias. REVSTAT-Statistical Journal. 18, 5, 697–720.

Examples

Run this code
data(EOC)
x_mat <- cbind(EOC$CA125, EOC$CA153, EOC$Age)
dise_na <- pre_data(EOC$D, EOC$CA125)
dise_vec_na <- dise_na$dise_vec
cv_knn(x_mat, dise_vec_na, EOC$V, type = "mahala", plot = TRUE)

Run the code above in your browser using DataLab