qn.test.combined: Combined Rank Score k-Sample Tests

Description

This function combines several independent rank score $k$-sample tests into one overall test of the hypothesis that the independent samples within each block come from a common unspecified distribution, while the common distributions may vary from block to block.

Usage

qn.test.combined(…, data = NULL, test = c("KW", "vdW", "NS"),
	method = c("asymptotic", "simulated", "exact"),
	dist = FALSE, Nsim = 10000)

Arguments

…

Either a sequence of several lists, say $L_1, \ldots, L_M$ ($M > 1$) where list $L_i$ contains $k_i > 1$ sample vectors of respective sizes $n_{i1}, \ldots, n_{ik_i}$, where $n_{ij} > 4$ is recommended for reasonable asymptotic $P$-value calculation. $N_i=n_{i1}+\ldots+n_{ik_i}$ is the pooled sample size for block $i$,

or a list of such lists,

or a formula, like y ~ g | b, where y is a numeric response vector, g is a factor with levels indicating different treatments and b is a factor indicating different blocks; y, g, b have same length. y is split separately for each block level into separate samples according to the g levels. The same g level may occur in different blocks. The variable names may correspond to variables in an optionally supplied data frame via the data = argument.

data

= an optional data frame providing the variables in formula input

test

= c("KW", "vdW", "NS"),

where "KW" uses scores 1:N (Kruskal-Wallis test)

"vdW" uses van der Waerden scores, qnorm( (1:N) / (N+1) )

"NS" uses normal scores, i.e., expected values of standard normal order statistics, invoking function normOrder of package SuppDists

For the above scores $N$ changes from block to block and represents the total pooled sample size $N_i$ for block $i$.

method

=c("asymptotic","simulated","exact"), where

"asymptotic" uses only an asymptotic chi-square approximation for the $P$-value. The adequacy of asymptotic $P$-values for use with moderate sample sizes may be checked with method = "simulated".

"simulated" uses simulation to get Nsim simulated $QN$ statistics for each block of samples, adding them component wise across blocks to get Nsim combined values, and compares these with the observed combined value to get the estimated $P$-value.

"exact" uses full enumeration of the test statistic value for all sample splits of the pooled samples within each block. The test statistic vectors for each block are added (each component against each component, as in the R outer(x,y, "+") command) to get the convolution enumeration for the combined test statistic. This "addition" is done one block at a time. It is possible only for small problems, and is attempted only when Nsim is at least the (conservatively maximal) length $$\frac{N_1!}{n_{11}!\ldots n_{1k_1}!}\times\ldots\times \frac{N_M!}{n_{M1}!\ldots n_{Mk_M}!}$$ of the final distribution vector, were $N_i = n_{i1}+\ldots+n_{ik_i}$ is the sample size of the pooled samples for the i-th block. Otherwise, it reverts to the simulation method using the provided Nsim.

dist

FALSE (default) or TRUE. If TRUE, the simulated or fully enumerated convolution vector null.dist is returned for the $QN$ test statistic.

Otherwise, NULL is returned.

Nsim

= 10000 (default), number of simulation splits to use within each block of samples. It is only used when method = "simulated" or when method = "exact" reverts to method = "simulated", as previously explained. Simulations are independent across blocks, using Nsim for each block.

Value

A list of class kSamples with components

test.name

"Kruskal-Wallis", "van der Waerden scores", or

"normal scores"

number of blocks of samples being compared

n.samples

list of M vectors, each vector giving the sample sizes for each block of samples being compared

vector of length M of total sample sizes involved in each of the M comparisons of $k_i$ samples, respectively

n.ties

vector giving the number of ties in each the M comparison blocks

qn.list

list of M matrices giving the qn results from qn.test, applied to the samples in each of the M blocks

qn.c

2 (or 3) vector containing the observed $QN_{\rm comb}$, asymptotic $P$-value, (simulated or exact $P$-value).

warning

logical indicator, warning = TRUE when at least one $n_{ij} < 5$.

null.dist

simulated or enumerated null distribution of the $QN_{\rm comb}$ statistic. It is NULL when dist = FALSE or when method = "asymptotic".

method

The method used.

Nsim

The number of simulations used for each block of samples.

Details

The rank score $QN$ criterion $QN_i$ for the $i$-th block of $k_i$ samples, is used to test the hypothesis that the samples in the $i$-th block all come from the same but unspecified continuous distribution function $F_i(x)$. See qn.test for the definition of the $QN$ criterion and the exact calculation of its null distribution.

The combined $QN$ criterion $QN_{\rm comb} = QN_1 + \ldots + QN_M$ is used to simultaneously test whether the samples in block i come from the same continuous distribution function $F_i(x)$. However, the unspecified common distribution function $F_i(x)$ may change from block to block.

The $k$ for each block of $k$ independent samples may change from block to block.

The asymptotic approximating chi-square distribution has $f = (k_1-1)+\ldots+(k_M-1)$ degrees of freedom.

NA values are removed and the user is alerted with the total NA count. It is up to the user to judge whether the removal of NA's is appropriate.

The continuity assumption can be dispensed with if we deal with independent random samples, or if randomization was used in allocating subjects to samples or treatments, independently from block to block, and if the asymptotic, simulated or exact $P$-values are viewed conditionally, given the tie patterns within each block. Under such randomization any conclusions are valid only with respect to the blocks of subjects that were randomly allocated. In case of ties the average rank scores are used across tied observations within each block.

References

Lehmann, E.L. (2006), Nonparametric, Statistical Methods Based on Ranks, Springer Verlag, New York. Ch. 6, Sec. 5D.

Examples

Run this code

# NOT RUN {
## Create two lists of sample vectors.
x1 <- list( c(1, 3, 2, 5, 7), c(2, 8, 1, 6, 9, 4), c(12, 5, 7, 9, 11) )
x2 <- list( c(51, 43, 31, 53, 21, 75), c(23, 45, 61, 17, 60) )
# and a corresponding data frame datx1x2
x1x2 <- c(unlist(x1),unlist(x2))
gx1x2 <- as.factor(c(rep(1,5),rep(2,6),rep(3,5),rep(1,6),rep(2,5)))
bx1x2 <- as.factor(c(rep(1,16),rep(2,11)))
datx1x2 <- data.frame(A = x1x2, G = gx1x2, B = bx1x2)

## Run qn.test.combined.
set.seed(2627)
qn.test.combined(x1, x2, method = "simulated", Nsim = 1000) 
# or with same seed
# qn.test.combined(list(x1, x2), method = "simulated", Nsim = 1000)
# or qn.test.combined(A~G|B,data=datx1x2,method="simulated",Nsim=1000)
# }

Run the code above in your browser using DataLab