For a skewed distribution, estimate the mean, standard deviation, and skew; test the null hypothesis that the mean is equal to a user-specified value vs. a one-sided alternative; and create a one-sided confidence interval for the mean.
chenTTest(x, y = NULL, alternative = "greater", mu = 0, paired = !is.null(y),
conf.level = 0.95, ci.method = "z")
a list of class "htest"
containing the results of the hypothesis test. See
the help file for htest.object
for details.
numeric vector of observations. Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
optional numeric vector of observations that are paired with the observations in
x
. The length of y
must be the same as the length of x
.
Missing (NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are allowed but will be removed. This argument is ignored if
paired=FALSE
, and must be supplied if paired=TRUE
. The default value
is y=NULL
.
character string indicating the kind of alternative hypothesis. The possible values
are "greater"
(the default) and "less"
. The value "greater"
should be used for positively-skewed distributions, and the value "less"
should be used for negatively-skewed distributions.
numeric scalar indicating the hypothesized value of the mean. The default value is
mu=0
.
character string indicating whether to perform a paired or one-sample t-test. The
possible values are paired=FALSE
(the default; indicates a one-sample t-test)
and paired=TRUE
.
numeric scalar between 0 and 1 indicating the confidence level associated with the
confidence interval for the population mean. The default value is
conf.level=0.95
.
character string indicating which critical value to use to construct the confidence
interval for the mean. The possible values are "z"
(the default),
"t"
, and "Avg. of z and t"
. See the DETAILS section below for more
information.
Steven P. Millard (EnvStats@ProbStatInfo.com)
One-Sample Case (paired=FALSE
)
Let
Background: The Conventional Student's t-Test
Assume that the alternative="greater"
):
alternative="less"
):
t.test
). Under the null hypothesis (1),
the t-statistic in (5) follows a Student's t-distribution with
Chen's Modified t-Test for Skewed Distributions
In the case when the underlying distribution of the
Similarly, in the case when the underlying distribution of the
In order to overcome these problems, Chen (1995b) proposed the following modified
t-statistic that takes into account the skew of the underlying distribution:
skewness
).
For a positively-skewed distribution, Chen's modified t-test rejects the null hypothesis (1) in favor of the upper one-sided alternative (2) if the t-statistic in (8) is too large. For a negatively-skewed distribution, Chen's modified t-test rejects the null hypothesis (1) in favor of the lower one-sided alternative (3) if the t-statistic in (8) is too small.
Chen's modified t-test is not applicable to testing the two-sided alternative
(4). It should also not be used to test the upper one-sided alternative (2)
based on negatively-skewed data, nor should it be used to test the lower one-sided
alternative (3) based on positively-skewed data.
Determination of Critical Values and p-Values
Chen (1995b) performed a simulation study in which the modified t-statistic in (8)
was compared to a critical value based on the normal distribution (z-value),
a critical value based on Student's t-distribution (t-value), and the average of the
critical z-value and t-value. Based on the simulation study, Chen (1995b) suggests
using either the z-value or average of the z-value and t-value when
The function chenTTest
returns three different p-values: one based on the
normal distribution, one based on Student's t-distribution, and one based on the
average of these two p-values. This last p-value should roughly correspond to a
p-value based on the distribution of the average of a normal and Student's t
random variable.
Computing Confidence Intervals
The function chenTTest
computes a one-sided confidence interval for the true
mean conf.level
. The argument ci.method
determines which p-value
is used in the algorithm to determine the bounds on ci.method="z"
, the p-value is based on the normal distribution, when
ci.method="t"
, the p-value is based on Student's t-distribution, and when
ci.method="Avg. of z and t"
the p-value is based on the average of the
p-values based on the normal and Student's t-distribution.
Paired-Sample Case (paired=TRUE
)
When the argument paired=TRUE
, the arguments x
and y
are assumed
to have the same length, and the
Chen, L. (1995b). Testing the Mean of Skewed Distributions. Journal of the American Statistical Association 90(430), 767--772.
Johnson, N. L., S. Kotz, and N. Balakrishnan. (1995). Continuous Univariate Distributions, Volume 2. Second Edition. John Wiley and Sons, New York, Chapters 28, 31.
Land, C.E. (1971). Confidence Intervals for Linear Functions of the Normal Mean and Variance. The Annals of Mathematical Statistics 42(4), 1187--1205.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.402--404.
Singh, A., N. Armbya, and A. Singh. (2010b). ProUCL Version 4.1.00 Technical Guide (Draft). EPA/600/R-07/041, May 2010. Office of Research and Development, U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (1996c). Soil Screening Guidance: Technical Background Document. EPA/540/R-95/128, PB96963502. Office of Emergency and Remedial Response, U.S. Environmental Protection Agency, Washington, D.C., May, 1996.
USEPA. (2002d). Estimation of the Exposure Point Concentration Term Using a Gamma Distribution. EPA/600/R-02/084. October 2002. Technology Support Center for Monitoring and Site Characterization, Office of Research and Development, Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, D.C.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.
t.test
, elnorm
, elnormAlt
.
# The guidance document "Calculating Upper Confidence Limits for
# Exposure Point Concentrations at Hazardous Waste Sites"
# (USEPA, 2002d, Exhibit 9, p. 16) contains an example of 60 observations
# from an exposure unit. Here we will use Chen's modified t-test to test
# the null hypothesis that the average concentration is less than 30 mg/L
# versus the alternative that it is greater than 30 mg/L.
# In EnvStats these data are stored in the vector EPA.02d.Ex.9.mg.per.L.vec.
sort(EPA.02d.Ex.9.mg.per.L.vec)
# [1] 16 17 17 17 18 18 20 20 20 21 21 21 21 21 21 22
#[17] 22 22 23 23 23 23 24 24 24 25 25 25 25 25 25 26
#[33] 26 26 26 27 27 28 28 28 28 29 29 30 30 31 32 32
#[49] 32 33 33 35 35 97 98 105 107 111 117 119
dev.new()
hist(EPA.02d.Ex.9.mg.per.L.vec, col = "cyan", xlab = "Concentration (mg/L)")
# The Shapiro-Wilk goodness-of-fit test rejects the null hypothesis of a
# normal, lognormal, and gamma distribution:
gofTest(EPA.02d.Ex.9.mg.per.L.vec)$p.value
#[1] 2.496781e-12
gofTest(EPA.02d.Ex.9.mg.per.L.vec, dist = "lnorm")$p.value
#[1] 3.349035e-09
gofTest(EPA.02d.Ex.9.mg.per.L.vec, dist = "gamma")$p.value
#[1] 1.564341e-10
# Use Chen's modified t-test to test the null hypothesis that
# the average concentration is less than 30 mg/L versus the
# alternative that it is greater than 30 mg/L.
chenTTest(EPA.02d.Ex.9.mg.per.L.vec, mu = 30)
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: mean = 30
#
#Alternative Hypothesis: True mean is greater than 30
#
#Test Name: One-sample t-Test
# Modified for
# Positively-Skewed Distributions
# (Chen, 1995)
#
#Estimated Parameter(s): mean = 34.566667
# sd = 27.330598
# skew = 2.365778
#
#Data: EPA.02d.Ex.9.mg.per.L.vec
#
#Sample Size: 60
#
#Test Statistic: t = 1.574075
#
#Test Statistic Parameter: df = 59
#
#P-values: z = 0.05773508
# t = 0.06040889
# Avg. of z and t = 0.05907199
#
#Confidence Interval for: mean
#
#Confidence Interval Method: Based on z
#
#Confidence Interval Type: Lower
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 29.82
# UCL = Inf
# The estimated mean, standard deviation, and skew are 35, 27, and 2.4,
# respectively. The p-value is 0.06, and the lower 95% confidence interval
# is [29.8, Inf). Depending on what you use for your Type I error rate, you
# may or may not want to reject the null hypothesis.
Run the code above in your browser using DataLab