LocationTests: Independent Two- and K-Sample Location Tests

Description

Testing the equality of the distributions of a numeric response in two or more independent groups against shift alternatives.

Usage

## S3 method for class 'formula':
oneway_test(formula, data, subset = NULL, weights = NULL, \dots)
## S3 method for class 'IndependenceProblem':
oneway_test(object, ...)
## S3 method for class 'formula':
wilcox_test(formula, data, subset = NULL, weights = NULL, \dots)
## S3 method for class 'IndependenceProblem':
wilcox_test(object, 
    conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'formula':
normal_test(formula, data, subset = NULL, weights = NULL, \dots)
## S3 method for class 'IndependenceProblem':
normal_test(object, 
    ties.method = c("mid-ranks", "average-scores"),
    conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'formula':
median_test(formula, data, subset = NULL, weights = NULL, \dots)
## S3 method for class 'IndependenceProblem':
median_test(object, 
    conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'formula':
kruskal_test(formula, data, subset = NULL, weights = NULL, \dots)
## S3 method for class 'IndependenceProblem':
kruskal_test(object, 
    distribution = c("asymptotic", "approximate"), ...)

Arguments

Value

An object inheriting from class IndependenceTest-class with methods show, statistic, expectation, covariance and pvalue. The null distribution can be inspected by pperm, dperm, qperm and support methods. Confidence intervals can be extracted by confint.

Details

The null hypothesis of the equality of the distribution of y in the groups given by x is tested. In particular, the methods documented here are designed to detect shift alternatives. For a general description of the test procedures documented here we refer to Hollander & Wolfe (1999).

The test procedures apply a rank transformation to the response values y, except of oneway_test which computes a test statistic using the untransformed response values.

The asymptotic null distribution is computed by default for all procedures. Exact p-values may be computed for the two-sample problems and can be approximated via Monte-Carlo resampling for all procedures. Exact p-values are computed either by the shift algorithm (Streitberg & R"ohmel, 1986, 1987) or by the split-up algorithm (van de Wiel, 2001).

The linear rank tests for two samples (wilcox_test, normal_test and median_test) can be used to test the two-sided hypothesis $H_0: Y_1 - Y_2 = 0$, where $Y_i$ is the median of the responses in the ith group. Confidence intervals for the difference in location are available for the rank-based procedures and are computed according to Bauer (1972). In case alternative = "less", the null hypothesis $H_0: Y_1 - Y_2 \ge 0$ is tested and alternative = "greater" corresponds to a null hypothesis $H_0: Y_1 - Y_2 \le 0$.

In case x is an ordered factor, kruskal_test computes the linear-by-linear association test for ordered alternatives.

For the adjustment of scores for tied values see Hajek, Sidak and Sen (1999), page 131ff.

References

Myles Hollander & Douglas A. Wolfe (1999). Nonparametric Statistical Methods, 2nd Edition. New York: John Wiley & Sons.

Bernd Streitberg & Joachim R"ohmel (1986). Exact distributions for permutations and rank tests: An introduction to some recently published algorithms. Statistical Software Newsletter 12(1), 10--17.

Bernd Streitberg & Joachim R"ohmel (1987). Exakte Verteilungen f"ur Rang- und Randomisierungstests im allgemeinen $c$-Stichprobenfall. EDV in Medizin und Biologie 18(1), 12--19.

Mark A. van de Wiel (2001). The split-up algorithm: a fast symbolic method for computing p-values of rank statistics. Computational Statistics 16, 519--538.

David F. Bauer (1972). Constructing confidence sets using rank statistics. Journal of the American Statistical Association 67, 687--690.

Jaroslav Hajek, Zbynek Sidak & Pranab K. Sen (1999), Theory of Rank Tests. San Diego, London: Academic Press.

Examples

Run this code

### Tritiated Water Diffusion Across Human Chorioamnion
  ### Hollander & Wolfe (1999), Table 4.1, page 110
  water_transfer <- data.frame(
      pd = c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46,
             1.15, 0.88, 0.90, 0.74, 1.21),
      age = factor(c(rep("At term", 10), rep("12-26 Weeks", 5))))

  ### Wilcoxon-Mann-Whitney test, cf. Hollander & Wolfe (1999), page 111
  ### exact p-value and confidence interval for the difference in location
  ### (At term - 12-26 Weeks)
  wt <- wilcox_test(pd ~ age, data = water_transfer, 
                    distribution = "exact", conf.int = TRUE)
  print(wt)

  ### extract observed Wilcoxon statistic, i.e, the sum of the
  ### ranks for age = "12-26 Weeks"
  statistic(wt, "linear")

  ### its expectation
  expectation(wt)

  ### and variance
  covariance(wt)

  ### and, finally, the exact two-sided p-value
  pvalue(wt)

  ### Confidence interval for difference (12-26 Weeks - At term)
  wilcox_test(pd ~ age, data = water_transfer, 
              xtrafo = function(data) 
                  trafo(data, factor_trafo = function(x) 
                      as.numeric(x == levels(x)[2])),
              distribution = "exact", conf.int = TRUE)

  ### Permutation test, asymptotic p-value
  oneway_test(pd ~ age, data = water_transfer)

  ### approximate p-value (with 99\% confidence interval)
  pvalue(oneway_test(pd ~ age, data = water_transfer, 
                     distribution = approximate(B = 9999)))
  ### exact p-value
  pt <- oneway_test(pd ~ age, data = water_transfer, distribution = "exact")
  pvalue(pt)

  ### plot density and distribution of the standardized 
  ### test statistic
  layout(matrix(1:2, nrow = 2))
  s <- support(pt)
  d <- sapply(s, function(x) dperm(pt, x))
  p <- sapply(s, function(x) pperm(pt, x))
  plot(s, d, type = "S", xlab = "Teststatistic", ylab = "Density")
  plot(s, p, type = "S", xlab = "Teststatistic", ylab = "Cumm. Probability")


  ### Length of YOY Gizzard Shad from Kokosing Lake, Ohio,
  ### sampled in Summer 1984, Hollander & Wolfe (1999), Table 6.3, page 200
  YOY <- data.frame(length = c(46, 28, 46, 37, 32, 41, 42, 45, 38, 44, 
                               42, 60, 32, 42, 45, 58, 27, 51, 42, 52, 
                               38, 33, 26, 25, 28, 28, 26, 27, 27, 27, 
                               31, 30, 27, 29, 30, 25, 25, 24, 27, 30),
                    site = factor(c(rep("I", 10), rep("II", 10),
                                    rep("III", 10), rep("IV", 10))))

  ### Kruskal-Wallis test, approximate exact p-value
  kw <- kruskal_test(length ~ site, data = YOY, 
                     distribution = approximate(B = 9999))
  kw
  pvalue(kw)

  ### Nemenyi-Damico-Wolfe-Dunn test (joint ranking)
  ### Hollander & Wolfe (1999), page 244 
  ### (where Steel-Dwass results are given)
  if (require("multcomp")) {

    NDWD <- oneway_test(length ~ site, data = YOY,
        ytrafo = function(data) trafo(data, numeric_trafo = rank),
        xtrafo = function(data) trafo(data, factor_trafo = function(x)
            model.matrix(~x - 1) %*% t(contrMat(table(x), "Tukey"))),
        teststat = "max", distribution = approximate(B = 90000))

    ### global p-value
    print(pvalue(NDWD))

    ### sites (I = II) != (III = IV) at alpha = 0.01 (page 244)
    print(pvalue(NDWD, method = "single-step"))
  }

Run the code above in your browser using DataLab