
Perform a nonparametric test for a monotonic trend based on Kendall's tau statistic, and optionally compute a confidence interval for the slope.
kendallTrendTest(y, ...)# S3 method for formula
kendallTrendTest(y, data = NULL, subset,
na.action = na.pass, ...)
# S3 method for default
kendallTrendTest(y, x = seq(along = y),
alternative = "two.sided", correct = TRUE, ci.slope = TRUE,
conf.level = 0.95, warn = TRUE, data.name = NULL, data.name.x = NULL,
parent.of.data = NULL, subset.expression = NULL, ...)
an object containing data for the trend test. In the default method,
the argument y
must be numeric vector of observations.
In the formula method, y
must be a formula of the form y ~ 1
or
y ~ x
. The form y ~ 1
indicates use the observations in the vector
y
for the test for trend and use the default value of the argument x
in the call to kendallTrendTest.default
. The form y ~ x
indicates
use the observations in the vector y
for the test for trend and use the
specified value of the argument x
in the call to
kendallTrendTest.default
. Missing (NA
), undefined (NaN
),
and infinite (Inf
, -Inf
) values are allowed but will be
removed.
specifies an optional data frame, list or environment (or object coercible by
as.data.frame
to a data frame) containing the variables in the model.
If not found in data
, the variables are taken from environment(formula)
,
typically the environment from which kendallTrendTest
is called.
specifies an optional vector specifying a subset of observations to be used.
specifies a function which indicates what should happen when the data contain NA
s.
The default is na.pass
.
numeric vector of "predictor" values. The length of x
must equal the length of y
.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
) values are
allowed but will be removed. The default value of x
is the vector
of numbers y
.
character string indicating the kind of alternative hypothesis. The
possible values are "two.sided"
(tau not equal to 0; the default),
"less"
(tau less than 0), and "greater"
(tau greater than 0).
logical scalar indicating whether to use the correction for continuity in
computing the TRUE
.
logical scalar indicating whether to compute a confidence interval for the
slope. The default value is TRUE
.
numeric scalar between 0 and 1 indicating the confidence level associated
with the confidence interval for the slope. The default value is
0.95
.
logical scalar indicating whether to print a warning message when
y
does not contain at least two non-missing values,
or when x
does not contain at least two unique non-missing values.
The default value is TRUE
.
character string indicating the name of the data used for the trend test.
The default value is deparse(substitute(y))
.
character string indicating the name of the data used for the predictor variable x.
If x
is not supplied this argument is ignored. When x
is supplied,
the default value is deparse(substitute(x))
.
character string indicating the source of the data used for the trend test.
character string indicating the expression used to subset the data.
additional arguments affecting the test for trend.
A list of class "htest"
containing the results of the hypothesis
test. See the help file for htest.object
for details.
In addition, the following components are part of the list returned by
kendallTrendTest
:
The value of the Kendall S-statistic.
The variance of the Kendall S-statistic.
A numeric vector of all possible two-point slope estimates.
This component is used by the function kendallSeasonalTrendTest
.
kendallTrendTest
performs Kendall's nonparametric test for a monotonic trend,
which is a special case of the test for independence based on Kendall's tau statistic
(see cor.test
). The slope is estimated using the method of Theil (1950) and
Sen (1968). When ci.slope=TRUE
, the confidence interval for the slope is
computed using Gilbert's (1987) Modification of the Theil/Sen Method.
Kendall's test for a monotonic trend is a special case of the test for independence based on Kendall's tau statistic. The first section below explains the general case of testing for independence. The second section explains the special case of testing for monotonic trend. The last section explains how a simple linear regression model is a special case of a monotonic trend and how the slope may be estimated.
The General Case of Testing for Independence
Definition of Kendall's Tau
Let
Note that Kendall's tau is similar to a correlation coefficient in that
Estimating Kendall's Tau
The quantity in Equation (1) can be estimated by:
sign
function:
|
|
|
|
|
|
(Hollander and Wolfe, 1999, Chapter 8; Conover, 1980, pp.256--260; Gilbert, 1987, Chapter 16; Helsel and Hirsch, 1992, pp.212--216; Gibbons et al., 2009, Chapter 11). The quantity defined in Equation (2) is called Kendall's rank correlation coefficient or more often Kendall's tau.
Note that the quantity
Testing the Null Hypothesis of Independence
The null hypothesis kendallTrendTest
uses the large sample approximation to the
distribution of
Both Kendall (1975) and Mann (1945) show that the normal approximation is excellent
even for samples as small as kendallTrendTest
performs the usual one-sample z-test using
the statistic computed in Equation (8) or Equation (5). The argument
correct
determines which equation is used to compute the z-statistic.
By default, correct=TRUE
so Equation (8) is used.
In the case of tied observations in either the observed
|
|
|
|
|
|
|
|
where
The Special Case of Testing for Monotonic Trend
Often in environmental sampling, observations are taken periodically over time
(Hirsch et al., 1982; van Belle and Hughes, 1984; Hirsch and Slack, 1984). In
this case, the random variables x
for the function
kendallTrendTest
.
In the case where the
The Special Case of a Simple Linear Model: Estimating the Slope
Consider the simple linear regression model
Theil (1950) proposed the following nonparametric estimator of the slope:
Sen (1968) generalized this estimator to the case where there are possibly tied
observations in the
Conover (1980, p. 267) suggests the following estimator for the intercept:
NOTE: The function kendallTrendTest
always returns estimates of
slope and intercept assuming a linear model (Equation (12)), while the p-value
is based on Kendall's tau, which is testing for the broader alternative of any
kind of dependence between the
Confidence Interval for the Slope
Theil (1950) and Sen (1968) proposed methods to compute a confidence interval for
the true slope, assuming the linear model of Equation (12) (see
Hollander and Wolfe, 1999, pp.421-422). Gilbert (1987, p.218) illustrates a
simpler method than the one given by Sen (1968) that is based on a normal
approximation. Gilbert's (1987) method is an extension of the one given in
Hollander and Wolfe (1999, p.424) that allows for ties and/or multiple
observations per time period. This method is valid for a sample size as small as
Let
Usually the quantities kendallTrendTest
does.
Bradley, J.V. (1968). Distribution-Free Statistical Tests. Prentice-Hall, Englewood Cliffs, NJ.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, pp.256-272.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, Chapter 16.
Helsel, D.R. and R.M. Hirsch. (1988). Discussion of Applicability of the t-test for Detecting Trends in Water Quality Variables. Water Resources Bulletin 24(1), 201-204.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, NY.
Helsel, D.R., and R. M. Hirsch. (2002). Statistical Methods in Water Resources. Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. Available on-line at http://pubs.usgs.gov/twri/twri4a3/pdf/twri4a3-new.pdf.
Hirsch, R.M., J.R. Slack, and R.A. Smith. (1982). Techniques of Trend Analysis for Monthly Water Quality Data. Water Resources Research 18(1), 107-121.
Hirsch, R.M. and J.R. Slack. (1984). A Nonparametric Trend Test for Seasonal Data with Serial Dependence. Water Resources Research 20(6), 727-732.
Hirsch, R.M., R.B. Alexander, and R.A. Smith. (1991). Selection of Methods for the Detection and Estimation of Trends in Water Quality. Water Resources Research 27(5), 803-813.
Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods, Second Edition. John Wiley and Sons, New York.
Kendall, M.G. (1938). A New Measure of Rank Correlation. Biometrika 30, 81-93.
Kendall, M.G. (1975). Rank Correlation Methods. Charles Griffin, London.
Mann, H.B. (1945). Nonparametric Tests Against Trend. Econometrica 13, 245-259.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.
Sen, P.K. (1968). Estimates of the Regression Coefficient Based on Kendall's Tau. Journal of the American Statistical Association 63, 1379-1389.
Theil, H. (1950). A Rank-Invariant Method of Linear and Polynomial Regression Analysis, I-III. Proc. Kon. Ned. Akad. v. Wetensch. A. 53, 386-392, 521-525, 1397-1412.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
van Belle, G., and J.P. Hughes. (1984). Nonparametric Tests for Trend in Water Quality. Water Resources Research 20(1), 127-136.
# NOT RUN {
# Reproduce Example 17-6 on page 17-33 of USEPA (2009). This example
# tests for trend in sulfate concentrations (ppm) collected at various
# months between 1989 and 1996.
head(EPA.09.Ex.17.6.sulfate.df)
# Sample.No Year Month Sampling.Date Date Sulfate.ppm
#1 1 89 6 89.6 1989-06-01 480
#2 2 89 8 89.8 1989-08-01 450
#3 3 90 1 90.1 1990-01-01 490
#4 4 90 3 90.3 1990-03-01 520
#5 5 90 6 90.6 1990-06-01 485
#6 6 90 8 90.8 1990-08-01 510
# Plot the data
#--------------
dev.new()
with(EPA.09.Ex.17.6.sulfate.df,
plot(Sampling.Date, Sulfate.ppm, pch = 15, ylim = c(400, 900),
xlab = "Sampling Date", ylab = "Sulfate Conc (ppm)",
main = "Figure 17-6. Time Series Plot of \nSulfate Concentrations (ppm)")
)
Sulfate.fit <- lm(Sulfate.ppm ~ Sampling.Date,
data = EPA.09.Ex.17.6.sulfate.df)
abline(Sulfate.fit, lty = 2)
# Perform the Kendall test for trend
#-----------------------------------
kendallTrendTest(Sulfate.ppm ~ Sampling.Date,
data = EPA.09.Ex.17.6.sulfate.df)
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: tau = 0
#
#Alternative Hypothesis: True tau is not equal to 0
#
#Test Name: Kendall's Test for Trend
# (with continuity correction)
#
#Estimated Parameter(s): tau = 0.7667984
# slope = 26.6666667
# intercept = -1909.3333333
#
#Estimation Method: slope: Theil/Sen Estimator
# intercept: Conover's Estimator
#
#Data: y = Sulfate.ppm
# x = Sampling.Date
#
#Data Source: EPA.09.Ex.17.6.sulfate.df
#
#Sample Size: 23
#
#Test Statistic: z = 5.107322
#
#P-value: 3.267574e-07
#
#Confidence Interval for: slope
#
#Confidence Interval Method: Gilbert's Modification
# of Theil/Sen Method
#
#Confidence Interval Type: two-sided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 20.00000
# UCL = 35.71182
# Clean up
#---------
rm(Sulfate.fit)
graphics.off()
# }
Run the code above in your browser using DataLab