uclust (version 1.0.0)

utest_classify: Test for classification of a sample in one of two groups.

Description

The null hypothesis is that the new data is not well classified into the first group when compared to the second group. The alternative hypothesis is that the data is well classified into the first group.

Usage

utest_classify(x, data, group_id, bootstrap_iter = 1000)

Arguments

x

A numeric vector to be classified.

data

Data matrix. Each row represents an observation.

group_id

A vector of 0s (first group) and 1s indicating to which group the samples belong. Must be in the same order as data.

bootstrap_iter

Numeric scalar. The number of bootstraps. It's recommended \(1000 < bootstrap_iter < 10000\).

Value

A list with class "utest_classify" containing the following components:

statistic

the value of the test statistic.

p_value

The p-value for the test.

bootstrap_iter

the number of bootstrap iterations.

Details

The test is performed considering the squared Euclidean distance.

For more detail see Cybis, Gabriela B., Marcio Valk, and S<U+00ED>lvia RC Lopes. "Clustering and classification problems in genetics through U-statistics." Journal of Statistical Computation and Simulation 88.10 (2018) and Valk, Marcio, and Gabriela Bettella Cybis. "U-statistical inference for hierarchical clustering." arXiv preprint arXiv:1805.12179 (2018).

Examples

Run this code
# NOT RUN {
# Example 1
# Five observations from each group, G1 and G2. Each observation has 60 dimensions.
data <- matrix(c(rnorm(300, 0), rnorm(300, 10)), ncol = 60, byrow=TRUE)
# Test data comes from G1.
x <- rnorm(60, 0)
# The test correctly indicates that the test data should be classified into G1 (p < 0.05).
utest_classify(x, data, group_id = c(rep(0,times=5),rep(1,times=5)))

# Example 2
# Five observations from each group, G1 and G2. Each observation has 60 dimensions.
data <- matrix(c(rnorm(300, 0), rnorm(300, 10)), ncol = 60, byrow=TRUE)
# Test data comes from G2.
x <- rnorm(60, 10)
# The test correctly indicates that the test data should be classified into G2 (p > 0.05).
utest_classify(x, data, group_id = c(rep(1,times=5),rep(0,times=5)))
# }

Run the code above in your browser using DataLab