signTest: One-Sample or Paired-Sample Sign Test on a Median

Description

Estimate the median, test the null hypothesis that the median is equal to a user-specified value based on the sign test, and create a confidence interval for the median.

Usage

signTest(x, y = NULL, alternative = "two.sided", mu = 0, paired = FALSE, 
    conf.level = 0.95)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

optional numeric vector of observations that are paired with the observations in x. The length of y must be the same as the length of x. This argument is ignored if paired=FALSE, and must be supplied if paired=TRUE. The default value is y=NULL. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

alternative

character string indicating the kind of alternative hypothesis. The possible values are "two.sided" (the default), "greater", and "less".

numeric scalar indicating the hypothesized value of the median. The default value is mu=0.

paired

logical scalar indicating whether to perform a paired or one-sample sign test. The possible values are paired=FALSE (the default; indicates a one-sample sign test) and paired=TRUE.

conf.level

numeric scalar between 0 and 1 indicating the confidence level associated with the confidence interval for the population median. The default value is conf.level=0.95.

Value

A list of class "htest" containing the results of the hypothesis test. See the help file for htest.object for details.

Details

One-Sample Case (paired=FALSE) Let $\underline{x} = x_1, x_2, \ldots, x_n$ be a vector of $n$ independent observations from one or more distributions that all have the same median $\mu$.

Consider the test of the null hypothesis: $$H_0: \mu = \mu_0 \;\;\;\;\;\; (1)$$ The three possible alternative hypotheses are the upper one-sided alternative (alternative="greater") $$H_a: \mu > \mu_0 \;\;\;\;\;\; (2)$$ the lower one-sided alternative (alternative="less") $$H_a: \mu < \mu_0 \;\;\;\;\;\; (3)$$ and the two-sided alternative (alternative="two.sided") $$H_a: \mu \ne \mu_0 \;\;\;\;\;\; (4)$$

To perform the test of the null hypothesis (1) versus any of the three alternatives (2)-(4), the sign test uses the test statistic $T$ which is simply the number of observations that are greater than $\mu_0$ (Conover, 1980, p. 122; van Belle et al., 2004, p. 256; Hollander and Wolfe, 1999, p. 60; Lehmann, 1975, p. 120; Sheskin, 2011; Zar, 2010, p. 537). Under the null hypothesis, the distribution of $T$ is a binomial random variable with parameters size=$n$ and prob=0.5. Usually, however, cases for which the observations are equal to $\mu_0$ are discarded, so the distribution of $T$ is taken to be binomial with parameters size=$r$ and prob=0.5, where $r$ denotes the number of observations not equal to $\mu_0$. The sign test only requires that the observations are independent and that they all come from one or more distributions (not necessarily the same ones) that all have the same population median.

For a two-sided alternative hypothesis (Equation (4)), the p-value is computed as: $$p = Pr(X_{r,0.5} \le r-m) + Pr(X_{r,0.5} > m) \;\;\;\;\;\; (5)$$ where $X_{r,p}$ denotes a binomial random variable with parameters size=$r$ and prob=$p$, and $m$ is defined by: $$m = max(T, r-T) \;\;\;\;\;\; (6)$$

For a one-sided lower alternative hypothesis (Equation (3)), the p-value is computed as: $$p = Pr(X_{m,0.5} \le T) \;\;\;\;\;\; (7)$$ and for a one-sided upper alternative hypothesis (Equation (2)), the p-value is computed as: $$p = Pr(X_{m,0.5} \ge T) \;\;\;\;\;\; (8)$$

It is obvious that the sign test is simply a special case of the binomial test with p=0.5.

Computing Confidence Intervals Based on the relationship between hypothesis tests and confidence intervals, we can construct a confidence interval for the population median based on the sign test (e.g., Hollander and Wolfe, 1999, p. 72; Lehmann, 1975, p. 182). It turns out that this is equivalent to using the formulas for a nonparametric confidence intervals for the 0.5 quantile (see eqnpar).

Paired-Sample Case (paired=TRUE) When the argument paired=TRUE, the arguments x and y are assumed to have the same length, and the $n$ differences $d_i = x_i - y_i, \;\; i = 1, 2, \ldots, n$ are assumed to be independent observations from distributions with the same median $\mu$. The sign test can then be applied to the differences.

References

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, p.122

Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods. Second Edition. John Wiley and Sons, New York, p.60.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Oakland, CA, p.120.

Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.404--406.

Sheskin, D.J. (2011). Handbook of Parametric and Nonparametric Statistical Procedures Fifth Edition. CRC Press, Boca Raton, FL.

van Belle, G., L.D. Fisher, Heagerty, P.J., and Lumley, T. (2004). Biostatistics: A Methodology for the Health Sciences 2nd Edition. John Wiley & Sons, New York.

Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ,

Examples

Run this code

# NOT RUN {
  # Generate 10 observations from a lognormal distribution with parameters 
  # meanlog=2 and sdlog=1.  The median of this distribution is e^2 (about 7.4). 
  # Test the null hypothesis that the true median is equal to 5 against the 
  # alternative that the true mean is greater than 5. 
  # (Note: the call to set.seed allows you to reproduce this example).

  set.seed(23) 
  dat <- rlnorm(10, meanlog = 2, sdlog = 1) 
  signTest(dat, mu = 5) 

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 median = 5
  #
  #Alternative Hypothesis:          True median is not equal to 5
  #
  #Test Name:                       Sign test
  #
  #Estimated Parameter(s):          median = 19.21717
  #
  #Data:                            dat
  #
  #Test Statistic:                  # Obs > median = 9
  #
  #P-value:                         0.02148438
  #
  #Confidence Interval for:         median
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                93.45703%
  #
  #Confidence Limit Rank(s):        3 9 
  #
  #Confidence Interval:             LCL =  7.732538
  #                                 UCL = 35.722459

  # Clean up
  #---------
  rm(dat)

  #==========

  # The guidance document "Supplemental Guidance to RAGS: Calculating the 
  # Concentration Term" (USEPA, 1992d) contains an example of 15 observations 
  # of chromium concentrations (mg/kg) which are assumed to come from a 
  # lognormal distribution.  These data are stored in the vector 
  # EPA.92d.chromium.vec.  Here, we will use the sign test to test the null 
  # hypothesis that the median chromium concentration is less than or equal to 
  # 100 mg/kg vs. the alternative that it is greater than 100 mg/kg.  The 
  # estimated median is 110 mg/kg.  There are 8 out of 15 observations greater 
  # than 100 mg/kg, the p-value is equal to 0.5, and the lower 94% confidence 
  # limit is 41 mg/kg.

  signTest(EPA.92d.chromium.vec, mu = 100, alternative = "greater") 

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 median = 100
  #
  #Alternative Hypothesis:          True median is greater than 100
  #
  #Test Name:                       Sign test
  #
  #Estimated Parameter(s):          median = 110
  #
  #Data:                            EPA.92d.chromium.vec
  #
  #Test Statistic:                  # Obs > median = 8
  #
  #P-value:                         0.5
  #
  #Confidence Interval for:         median
  #
  #Confidence Interval Method:      exact
  #
  #Confidence Interval Type:        lower
  #
  #Confidence Level:                94.07654%
  #
  #Confidence Limit Rank(s):        5 
  #
  #Confidence Interval:             LCL =  41
  #                                 UCL = Inf
# }

Run the code above in your browser using DataLab