a numeric design matrix, which used in rho_knn to estimate probabilities of the disease status.
dise_vec
a n * 3 binary matrix with three columns, corresponding to the three classes of the disease status. In row i, 1 in column j indicates that the i-th subject belongs to class j, with j = 1, 2, 3. A row of NA values indicates a non-verified subject.
veri_stat
a binary vector containing the verification status (1 verified, 0 not verified).
k_list
a list of candidate values for K. If NULL(the default), the set \(\{1, 2, ..., n.ver\}\) is employed, where, \(n.ver\) is the number of verified subjects.
type
a type of distance, see rho_knn for more details. Default "eucli".
plot
if TRUE, a plot of cross-validation prediction error is produced.
Details
Data are divided into two groups, the first contains the data corresponding to veri_stat = 1, whereas the second contains the data corresponding to veri_stat = 0. In the first group, the discrepancy between the true disease status and the KNN estimates of the probabilities of the disease status is computed by varying k from 1 to the number of verification subjects, see To Duc et al. (2020). The optimal value of k is the value that corresponds to the smallest value of the discrepancy.
References
To Duc, K., Chiogna, M. and Adimari, G. (2020)
Nonparametric estimation of ROC surfaces in presence of verification bias.
REVSTAT-Statistical Journal. 18, 5, 697–720.