Qsym.test: Pielou's Second Type of NN Symmetry Test with Chi-square Approximation

Description

An object of class "Chisqtest" performing the hypothesis test of equality of the probabilities for the rows in the \(Q\)-symmetry contingency table (QCT). Each row of the QCT is the vector of Qi\(j\) values where \(Q_{ij}\) is the number of class \(i\) points that are NN to \(j\) points. That is, the test performs Pielou's second type of NN symmetry test which is also equivalent to Pearson's test on the QCT (pielou:1961;textualnnspat). Pielou's second type of NN symmetry is the symmetry in the shared NN structure for all classes, which is also called \(Q\)-symmetry. The test is appropriate (i.e., have the appropriate asymptotic sampling distribution) provided that data is obtained by sparse sampling, although simulations suggest it seems to work for completely mapped data as well. (See ceyhan:SWJ-spat-sym2014;textualnnspat for more detail).

The argument is.ipd is a logical argument (default=TRUE) to determine the structure of the argument x. If TRUE, x is taken to be the inter-point distance (IPD) matrix, and if FALSE, x is taken to be the data set with rows representing the data points.

The argument combine is a logical argument (default=TRUE) to determine whether to combine the 3rd column and the columns to the left. If TRUE, this function pools the cells 3 or larger together for \(k\) classes in the QCT, so, \(Q_2\), \(Q_3\) etc. are pooled, so, the column labels are \(Q_0\), \(Q_1\) and \(Q_2\) with the last one is actually sum of \(Q_j\) for \(j \ge 2\) in the QCT. If FALSE, the function does not perform the pooling of the cells.

The function yields the test statistic, \(p\)-value and df which is \((k-1)(n_c-1)\) where \(n_c\) is the number of columns in QCT (which reduces to \(2(k-1)\), if combine=TRUE). It also provides the description of the alternative with the corresponding null values (i.e., expected values) of the entries of the QCT and also the sample estimates of the entries of QCT (i.e., the observed QCT). The function also provides names of the test statistics, the description of the test and the data set used.

The null hypothesis is the symmetry in the shared NN structure for each class, that is, all \(E(Q_{ij})=n_i Q_j/n\) where \(n_i\) the size of class \(i\) and \(Q_j\) is the sum of column \(j\) in the QCT (i.e., the total number of points serving as NN to class \(j\) other points). (i.e., symmetry in the mixed NN structure).

See also (pielou:1961,ceyhan:SWJ-spat-sym2014;textualnnspat) and the references therein.

Usage

Qsym.test(x, lab, is.ipd = TRUE, combine = TRUE, ...)

Value

A list with the elements

statistic: The chi-squared test statistic for Pielou's second type of NN symmetry test (i.e., \(Q\)-symmetry which is equivalent to symmetry in the shared NN structure)
p.value: The \(p\)-value for the hypothesis test
df: Degrees of freedom for the chi-squared test, which is \((k-1)(n_c-1)\) where \(n_c\) is the number of columns in QCT (which reduces to \(2(k-1)\) if combine=TRUE).
estimate: Estimates, i.e., the observed QCT.
est.name,est.name2: Names of the estimates, they are identical for this function.
null.value: Hypothesized null values for the entries of the QCT, i.e., the matrix with entries \(E(Q_{ij})=n_i Q_j/n\) where \(n_i\) the size of class \(i\) and \(Q_j\) is the sum of column \(j\) in the QCT (i.e., the total number of points serving as NN to class \(j\) other points).
method: Description of the hypothesis test
data.name: Name of the data set, x

Arguments

x: The IPD matrix (if is.ipd=TRUE) or a data set of points in matrix or data frame form where points correspond to the rows (if is.ipd = FALSE).
lab: The vector of class labels (numerical or categorical)
is.ipd: A logical parameter (default=TRUE). If TRUE, x is taken as the inter-point distance matrix (IPD matrix), otherwise, x is taken as the data set with rows representing the data points.
combine: A logical parameter (default=TRUE). If TRUE, the cells in column 3 or columns to the left are merged in the QCT, so, \(Q_2\), \(Q_3\) etc. are pooled, so, the column labels are \(Q_0\), \(Q_1\) and \(Q_2\) with the last one is actually sum of \(Q_j\) for \(j \ge 2\) in the QCT. If FALSE, the function does not perform the pooling of the cells.
...: are for further arguments, such as method and p, passed to the dist function.

Author

Elvan Ceyhan

References

Examples

Run this code

n<-20  #or try sample(1:20,1)
Y<-matrix(runif(3*n),ncol=3)
cls<-sample(1:2,n,replace = TRUE)  #or try cls<-rep(1:2,c(10,10))
ipd<-ipd.mat(Y)
Qsym.ct(ipd,cls)

Qsym.test(ipd,cls)
Qsym.test(Y,cls,is.ipd = FALSE)
Qsym.test(Y,cls,is.ipd = FALSE,method="max")

Qsym.test(ipd,cls,combine = FALSE)

#cls as a faqctor
na<-floor(n/2); nb<-n-na
fcls<-rep(c("a","b"),c(na,nb))
Qsym.test(ipd,fcls)
Qsym.test(Y,fcls,is.ipd = FALSE)

#############
n<-40
Y<-matrix(runif(3*n),ncol=3)
ipd<-ipd.mat(Y)
cls<-sample(1:4,n,replace = TRUE)  #or try cls<-rep(1:2,c(10,10))

Qsym.test(ipd,cls)
Qsym.test(Y,cls,is.ipd = FALSE)

Run the code above in your browser using DataLab