Learn R Programming

ICGE (version 0.4.2)

INCAtest: INCA Test

Description

Assume that n units are divided into k groups C1,...,Ck. Function INCAtest performs the typicality INCA test. Therein, the null hypothesis that a new unit x0 is a typical unit with respect to a previously fixed partition is tested versus the alternative hypothesis that the unit is atypical.

Usage

INCAtest(d, pert, d_test, np = 1000, alpha = 0.05, P = 1)

Value

A list with class "incat" containing the following components:

StatisticW0

value of the INCA statistic.

ProjectionsU

values of statistics measuring the projection from the specific object to each considered group.

pvalues

p-values obtained in the P times repeated bootstrap procedure. Note: If P>1, it is printed the number of times the p-values were smaller than alpha.

alpha

specified value of the level of the test.

Arguments

d

a distance matrix or a dist object with distance information between units.

pert

an n-vector that indicates which group each unit belongs to. Note that the expected values of pert are numbers greater than or equal to 1 (for instance 1,2,3,4..., k). The default value indicates there is only one group in data.

d_test

an n-vector containing the distances from x0 to the other units.

np

sample size for the bootstrap sample for the bootstrap procedure.

alpha

fixed level for the test.

P

Number of times the bootstrap procedure is repeated.

Author

Itziar Irigoien itziar.irigoien@ehu.es; Konputazio Zientziak eta Adimen Artifiziala, Euskal Herriko Unibertsitatea (UPV-EHU), Donostia, Spain.

Conchita Arenas carenas@ub.edu; Departament d'Estadistica, Universitat de Barcelona, Barcelona, Spain.

References

Irigoien, I. and Arenas, C. (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine, 27(15), 2948--2973.

Arenas, C. and Cuadras, C.M. (2002). Some recent statistical methods based on distances. Contributions to Science, 2, 183--191.

See Also

estW, INCAindex

Examples

Run this code
#generate 3 clusters, each of them with 20 objects in dimension 5.
mu1 <- sample(1:10, 5, replace=TRUE)
x1 <- matrix(rnorm(20*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)
mu2 <- sample(1:10, 5, replace=TRUE)
x2 <- matrix(rnorm(20*5, mean = mu2, sd = 1),ncol=5, byrow=TRUE)
mu3 <- sample(1:10, 5, replace=TRUE)
x3 <- matrix(rnorm(20*5, mean = mu3, sd = 1),ncol=5, byrow=TRUE)
x <- rbind(x1,x2,x3)

# Euclidean distance between units in matrix x.
d <- dist(x)
# given the right partition
partition <- c(rep(1,20), rep(2,20), rep(3,20))

# x0 contains a unit from one group, as for example group 1.
x0 <-  matrix(rnorm(1*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE)

# distances between x0 and the other units.
dx0 <- rep(0,60)
for (i in 1:60){
	dif <-x0-x[i,]
	dx0[i] <- sqrt(sum(dif*dif))
}

INCAtest(d, partition, dx0, np=10)


# x0 contains a unit from a new group.
x0 <-  matrix(rnorm(1*5, mean = sample(1:10, 5, replace=TRUE),
        sd = 1), ncol=5, byrow=TRUE)

# distances between x0 and the other units in matrix x.
dx0 <- rep(0,60)
for (i in 1:60){
	dif <-x0-x[i,]
	dx0[i] <- sqrt(sum(dif*dif))
}

INCAtest(d, partition, dx0, np=10)

Run the code above in your browser using DataLab