anocva: ANalysis Of Cluster VAriability

Description

The ANOCVA (ANalysis Of Cluster VAriability) is a non-parametric statistical test to compare clusters with applications in functional magnetic resonance imaging data. The ANOCVA allows us to compare the clustering structure of multiple groups simultaneously and also to identify features that contribute to the differential clustering.

Usage

anocva(dataDist, id, replicates = 1000, r = NULL,
  clusteringFunction = NULL, p = 1, maxClust = 20,
  criterion = c("slope", "silhouette"), showElapTime = TRUE)

Arguments

dataDist

A matrix with multiple matrices of dissimilarites. Given that a subject with N items (e.g. ROIs) has a matrix of dissimilarities of size NxN, the dataDist matrix should contain the dissimilarity matrix of all subjects (n) of all populations, resulting in a three-dimensional (nxNxN) matrix.

A list in range 1,2,...,n, where id[i] identifies the population id for the i-th subject.

replicates

The number of bootstrap replicates. The default value is 1000.

The optimal number of clusters. If NULL, then it will be estimated by the slope criterion in the interval 2..20.

clusteringFunction

Determines the clustering function that will be used. The default function is 'spectralClustering'. The clustering function is suposed to return the clustering labels.

Slope adjust parameter. Only used if r is unknown.

maxClust

The maximum number of clusters to be tried if estimating optimal number of clusters. The default value is 20.

criterion

The criterion that will be used for estimating the number of clusters (if r is unknown). The options are "slope" or "silhouette". If not defined, "slope" will be used.

showElapTime

Determines whether the total processing time should be displayed. The default value is TRUE.

Value

ANOCVA p-values

Details

The test statistic used is the one proposed by Caetano de Jesus (2017).

References

Fujita A, Takahashi DY, Patriota AG, Sato JR (2014a) A non-parametric statistical test to compare clusters with applications in functional magnetic resonance imaging data. Statistics in Medicine 33: 4949<U+2013>4962

Vidal MC, Sato JR, Balardin JB, Takahashi DY, Fujita A (2017) ANOCVA in R: a software to compare clusters between groups and its application to the study of autism spectrum disorder. Frontiers in Neuroscience 11:1<U+2013>8

Caetano de Jesus DA. (2017) Evaluation of ANOCVA test for cluster comparison through simulations. Master Dissertation. Institute of Mathematics and Statistics, University of S<U+00E3>o Paulo.

Examples

Run this code

# NOT RUN {
# Install packages if necessary
# install.packages('MASS')
# install.packages('cluster')

library(anocva)
library(MASS)
library(cluster)

set.seed(5000)

# Defines a k-means function that returns cluster labels directly
myKmeans = function(dist, k){
  return(kmeans(dist, k, iter.max = 50, nstart = 5)$cluster)
}

# Number of subjects in each population
nsub = 20
# Number of items in each subject
nitem = 30

# Generate simulated data
data = array(NA, c(nsub*2, nitem*2, 2))
dataDist = array(NA, c(nsub*2, nitem*2, nitem*2))
meanx = 2
delta = 0.5
# Covariance matrix
sigma = matrix(c(0.03, 0, 0, 0.03), 2)
for (i in seq(nsub*2)){
  sub = rbind(mvrnorm(nitem, mu = c(0, 0), Sigma = sigma ),
              mvrnorm(nitem, mu = c(meanx,0), Sigma = sigma))
  data[i,,] = sub
  # If it's a sample of population 2.
  if (i > nsub){
    data[i,10,1] = data[i,10,1] + delta
  }
  # Euclidian distance
  dataDist[i,,] = as.matrix(dist(data[i,,]))
}

# Population 1 subject
plot(data[5,,], asp = 1, xlab = '', ylab = '', main = 'Population 1 - subject example')

# Population 2 subject
plot(data[35,,], asp = 1, xlab = '', ylab = '', main = 'Population 2 - subject example')

# The first nsub subjects belong to population 1 while the next nsub subjects belong to population 2
id = c(rep(1, nsub), rep(2, nsub))

# }
# NOT RUN {
# ANOCVA call with different clustering function (myKmeans) and inside estimation of
# the number of clusters (r)
res1 = anocva(dataDist, id, replicates=500, r = NULL,
              clusteringFunction = myKmeans,
              p = 1, criterion = "slope")
# }
# NOT RUN {
# Estimate the number of clusters previously by using Spectral Clustering and Slope criterion
r = nClustMulti(dataDist, clusteringFunction = spectralClustering, criterion = 'slope')

# Calls ANOCVA statistical test
res = anocva(dataDist, id, replicates=500, r = r,
             clusteringFunction = spectralClustering,
             p = 1, criterion = "slope")

# DeltaS p-value
res$pValueDeltaS

# DeltaSq p-values
res$pValueDeltaSq

# Identifies which items have significant p-values with a significance level of 0.05.
which(res$pValueDeltaSq < 0.05)

# Identifies which items have significant FDR adjusted p-values (q-values)
# with a significance level of 0.05.
qValue = p.adjust(res$pValueDeltaSq, "fdr")
which(qValue < 0.05)

# }

Run the code above in your browser using DataLab