oneSamplePermutationTest: Fisher's One-Sample Randomization (Permutation) Test for Location

Description

Perform Fisher's one-sample randomization (permutation) test for location.

Usage

oneSamplePermutationTest(x, alternative = "two.sided", mu = 0, exact = FALSE, 
    n.permutations = 5000, seed = NULL, ...)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

alternative

character string indicating the kind of alternative hypothesis. The possible values are "two.sided" (the default), "less", and "greater".

numeric scalar indicating the hypothesized value of the mean. The default value is mu=0.

exact

logical scalar indicating whether to perform the exact permutation test (i.e., enumerate all possible permutations) or simply sample from the permutation distribution. The default value is exact=FALSE.

n.permutations

integer indicating how many times to sample from the permutation distribution when exact=FALSE. The default value is n.permutations=5000. This argument is ignored when exact=TRUE.

seed

positive integer to pass to the R function set.seed. The default is seed=NULL, in which case the current value of .Random.seed is used. Using the seed argument lets you reproduce the exact same result if all other arguments stay the same.

…

arguments that can be supplied to the format function. This argument is used when creating the names attribute for the statistic component of the returned list (see permutationTest.object).

Value

A list of class "permutationTest" containing the results of the hypothesis test. See the help file for permutationTest.object for details.

Details

Randomization Tests In 1935, R.A. Fisher introduced the idea of a randomization test (Manly, 2007, p. 107; Efron and Tibshirani, 1993, Chapter 15), which is based on trying to answer the question: “Did the observed pattern happen by chance, or does the pattern indicate the null hypothesis is not true?” A randomization test works by simply enumerating all of the possible outcomes under the null hypothesis, then seeing where the observed outcome fits in. A randomization test is also called a permutation test, because it involves permuting the observations during the enumeration procedure (Manly, 2007, p. 3).

In the past, randomization tests have not been used as extensively as they are now because of the “large” computing resources needed to enumerate all of the possible outcomes, especially for large sample sizes. The advent of more powerful personal computers and software has allowed randomization tests to become much easier to perform. Depending on the sample size, however, it may still be too time consuming to enumerate all possible outcomes. In this case, the randomization test can still be performed by sampling from the randomization distribution, and comparing the observed outcome to this sampled permutation distribution.

Fisher's One-Sample Randomization Test for Location Let $\underline{x} = x_1, x_2, \ldots, x_n$ be a vector of $n$ independent and identically distributed (i.i.d.) observations from some symmetric distribution with mean $\mu$. Consider the test of the null hypothesis that the mean $\mu$ is equal to some specified value $\mu_0$: $$H_0: \mu = \mu_0 \;\;\;\;\;\; (1)$$ The three possible alternative hypotheses are the upper one-sided alternative (alternative="greater") $$H_a: \mu > \mu_0 \;\;\;\;\;\; (2)$$ the lower one-sided alternative (alternative="less") $$H_a: \mu < \mu_0 \;\;\;\;\;\; (3)$$ and the two-sided alternative $$H_a: \mu \ne \mu_0 \;\;\;\;\;\; (4)$$ To perform the test of the null hypothesis (1) versus any of the three alternatives (2)-(4), Fisher proposed using the test statistic $$T = \sum_{i=1}^n y_i \;\;\;\;\;\; (5)$$ where $$y_i = x_i - \mu_0 \;\;\;\;\;\; (6)$$ (Manly, 2007, p. 112). The test assumes all of the observations come from the same distribution that is symmetric about the true population mean (hence the mean is the same as the median for this distribution). Under the null hypothesis, the $y_i$'s are equally likely to be positive or negative. Therefore, the permutation distribution of the test statistic $T$ consists of enumerating all possible ways of permuting the signs of the $y_i$'s and computing the resulting sums. For $n$ observations, there are $2^n$ possible permutations of the signs, because each observation can either be positive or negative.

For a one-sided upper alternative hypothesis (Equation (2)), the p-value is computed as the proportion of sums in the permutation distribution that are greater than or equal to the observed sum $T$. For a one-sided lower alternative hypothesis (Equation (3)), the p-value is computed as the proportion of sums in the permutation distribution that are less than or equal to the observed sum $T$. For a two-sided alternative hypothesis (Equation (4)), the p-value is computed by using the permutation distribution of the absolute value of $T$ (i.e., $|T|$) and computing the proportion of values in this permutation distribution that are greater than or equal to the observed value of $|T|$.

Confidence Intervals Based on Permutation Tests Based on the relationship between hypothesis tests and confidence intervals, it is possible to construct a two-sided or one-sided $(1-\alpha)100\%$ confidence interval for the mean $\mu$ based on the one-sample permutation test by finding the values of $\mu_0$ that correspond to obtaining a p-value of $\alpha$ (Manly, 2007, pp. 18--20, 113). A confidence interval based on the bootstrap however, will yield a similar type of confidence interval (Efron and Tibshirani, 1993, p. 214); see the help file for boot in the R package boot.

References

Efron, B., and R.J. Tibshirani. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York, pp.224--227.

Manly, B.F.J. (2007). Randomization, Bootstrap and Monte Carlo Methods in Biology. Third Edition. Chapman & Hall, New York, pp.112-113.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.404--406.

Examples

Run this code

# NOT RUN {
  # Generate 10 observations from a logistic distribution with parameters 
  # location=7 and scale=2, and test the null hypothesis that the true mean 
  # is equal to 5 against the alternative that the true mean is greater than 5. 
  # Use the exact permutation distribution. 
  # (Note: the call to set.seed() allows you to reproduce this example).

  set.seed(23) 

  dat <- rlogis(10, location = 7, scale = 2) 

  test.list <- oneSamplePermutationTest(dat, mu = 5, 
    alternative = "greater", exact = TRUE) 

  # Print the results of the test 
  #------------------------------
  test.list 

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 Mean (Median) = 5
  #
  #Alternative Hypothesis:          True Mean (Median) is greater than 5
  #
  #Test Name:                       One-Sample Permutation Test
  #                                 (Exact)
  #
  #Estimated Parameter(s):          Mean = 9.977294
  #
  #Data:                            dat
  #
  #Sample Size:                     10
  #
  #Test Statistic:                  Sum(x - 5) = 49.77294
  #
  #P-value:                         0.001953125


  # Plot the results of the test 
  #-----------------------------
  dev.new()
  plot(test.list)

  #==========

  # The guidance document "Supplemental Guidance to RAGS: Calculating the 
  # Concentration Term" (USEPA, 1992d) contains an example of 15 observations 
  # of chromium concentrations (mg/kg) which are assumed to come from a 
  # lognormal distribution.  These data are stored in the vector 
  # EPA.92d.chromium.vec.  Here, we will use the permutation test to test 
  # the null hypothesis that the mean (median) of the log-transformed chromium 
  # concentrations is less than or equal to log(100 mg/kg) vs. the alternative 
  # that it is greater than log(100 mg/kg).  Note that we *cannot* use the 
  # permutation test to test a hypothesis about the mean on the original scale 
  # because the data are not assumed to be symmetric about some mean, they are 
  # assumed to come from a lognormal distribution.
  #
  # We will sample from the permutation distribution.
  # (Note: setting the argument seed=542 allows you to reproduce this example).

  test.list <- oneSamplePermutationTest(log(EPA.92d.chromium.vec), 
    mu = log(100), alternative = "greater", seed = 542) 

  test.list

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 Mean (Median) = 4.60517
  #
  #Alternative Hypothesis:          True Mean (Median) is greater than 4.60517
  #
  #Test Name:                       One-Sample Permutation Test
  #                                 (Based on Sampling
  #                                 Permutation Distribution
  #                                 5000 Times)
  #
  #Estimated Parameter(s):          Mean = 4.378636
  #
  #Data:                            log(EPA.92d.chromium.vec)
  #
  #Sample Size:                     15
  #
  #Test Statistic:                  Sum(x - 4.60517) = -3.398017
  #
  #P-value:                         0.7598


  # Plot the results of the test 
  #-----------------------------
  dev.new()
  plot(test.list)

  #----------

  # Clean up
  #---------
  rm(test.list)
  graphics.off()
# }

Run the code above in your browser using DataLab