
Last chance! 50% off unlimited learning
Sale ends in
twoSamplePermutationTestLocation(x, y, fcn = "mean", alternative = "two.sided",
mu1.minus.mu2 = 0, paired = FALSE, exact = FALSE, n.permutations = 5000,
seed = NULL, tol = sqrt(.Machine$double.eps))
NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are allowed but will be removed.NA
), undefined (NaN
), and infinite (Inf
, -Inf
)
values are allowed but will be removed.
In the case when paired=TRUE
fcn="mean"
(the default) and
fcn="median"
. This argument is ignored when paired=TRUE
."two.sided"
(the default), "less"
, and "greater"
.mu1.minus.mu2=0
.paired=FALSE
(the default; indicates a
two-sample permutation test) and paired=TRUE
(indicates take differenceexact=FALSE
.exact=FALSE
. The default value is n.permutations=5000
.
This argument is ignored when exact=TRUE
.set.seed
. The
default is seed=NULL
, in which case the current value of
.Random.seed
is tol=sqrt(.Machine$double.eps)
. See the DETAILS section below for more
information."permutationTest"
containing the results of the hypothesis
test. See the help file for permutationTest.object
for details.paired=FALSE
)
Let $\underline{x} = x_1, x_2, \ldots, x_{n1}$ be a vector of $n1$
independent and identically distributed (i.i.d.) observations
from some distribution with location parameter (e.g., mean or median) $\theta_1$,
and let $\underline{y} = y_1, y_2, \ldots, y_{n2}$ be a vector of $n2$
i.i.d. observations from the same distribution with possibly different location
parameter $\theta_2$.
Consider the test of the null hypothesis that the difference in the location
parameters is equal to some specified value:
alternative="greater"
)
alternative="less"
)
tol
. Similarly, a one-sided
lower p-value is computed as the proportion of times that the differences in the
means (or medians) in the permutation distribution are less than or equal to
[the observed difference in the means (or medians) + a small tolerance value].
Finally, a two-sided p-value is computed as the proportion of times the absolute
values of the differences in the means (or medians) in the permutation distribution
are greater than or equal to
[the absolute value of the observed difference in the means (or medians) - a small tolerance value].
In this simple example, we assumed the hypothesized differences in the means under
the null hypothesis was $\delta_0 = 0$. If we had hypothesized a different
value for $\delta_0$, then we would have had to subtract this value from each of
the observations in Group 1 before permuting the group assignments to compute the
permutation distribution of the differences of the means. As in the case of the
one-sample permutation test, if the sample sizes
for the groups become too large to compute all possible permutations of the group
assignments, the permutation test can still be performed by sampling from the
permutation distribution and comparing the observed difference in locations to the
sampled permutation distribution of the difference in locations.
Unlike the two-sample Student's t-test, we do not have to worry
about the normality assumption when we use a permutation test. The permutation test
still assumes, however, that under the null hypothesis, the distributions of the
observations from each group are exactly the same, and under the alternative
hypothesis there is simply a shift in location (that is, the whole distribution of
group 1 is shifted by some constant relative to the distribution of group 2).
Mathematically, this can be written as follows:
boot
in the Rpackage paired=TRUE
)
When the argument paired=TRUE
, the arguments x
and y
are
assumed to have the same length, and the $n1 = n2 = n$ differences
$y_i = x_i - y_i$, $i = 1, 2, \ldots, n$ are assumed to be independent
observations from some symmetric distribution with mean $\mu$. The
one-sample permutation test can then be applied
to the differences.permutationTest.object
, plot.permutationTest
,
oneSamplePermutationTest
,
twoSamplePermutationTestProportion
,
Hypothesis Tests, boot
.# Generate 10 observations from a lognormal distribution with parameters
# mean=5 and cv=2, and and 20 observations from a lognormal distribution with
# parameters mean=10 and cv=2. Test the null hypothesis that the means of the
# two distributions are the same against the alternative that the mean for
# group 1 is less than the mean for group 2.
# (Note: the call to set.seed allows you to reproduce the same data
# (dat1 and dat2), and setting the argument seed=732 in the call to
# twoSamplePermutationTestLocation() lets you reproduce this example by
# getting the same sample from the permutation distribution).
set.seed(256)
dat1 <- rlnormAlt(10, mean = 5, cv = 2)
dat2 <- rlnormAlt(20, mean = 10, cv = 2)
test.list <- twoSamplePermutationTestLocation(dat1, dat2,
alternative = "less", seed = 732)
# Print the results of the test
#------------------------------
test.list
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: mu.x-mu.y = 0
#
#Alternative Hypothesis: True mu.x-mu.y is less than 0
#
#Test Name: Two-Sample Permutation Test
# Based on Differences in Means
# (Based on Sampling
# Permutation Distribution
# 5000 Times)
#
#Estimated Parameter(s): mean of x = 2.253439
# mean of y = 11.825430
#
#Data: x = dat1
# y = dat2
#
#Sample Sizes: nx = 10
# ny = 20
#
#Test Statistic: mean.x - mean.y = -9.571991
#
#P-value: 0.001
# Plot the results of the test
#-----------------------------
dev.new()
plot(test.list)
#==========
# The guidance document "Statistical Methods for Evaluating the Attainment of
# Cleanup Standards, Volume 3: Reference-Based Standards for Soils and Solid
# Media" (USEPA, 1994b, pp. 6.22-6.25) contains observations of
# 1,2,3,4-Tetrachlorobenzene (TcCB) in ppb at a Reference Area and a Cleanup Area.
# These data are stored in the data frame EPA.94b.tccb.df. Use the
# two-sample permutation test to test for a difference in means between the
# two areas vs. the alternative that the mean in the Cleanup Area is greater.
# Do the same thing for the medians.
#
# The permutation test based on comparing means shows a significant differnce,
# while the one based on comparing medians does not.
# First test for a difference in the means.
#------------------------------------------
mean.list <- with(EPA.94b.tccb.df,
twoSamplePermutationTestLocation(
TcCB[Area=="Cleanup"], TcCB[Area=="Reference"],
alternative = "greater", seed = 47))
mean.list
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: mu.x-mu.y = 0
#
#Alternative Hypothesis: True mu.x-mu.y is greater than 0
#
#Test Name: Two-Sample Permutation Test
# Based on Differences in Means
# (Based on Sampling
# Permutation Distribution
# 5000 Times)
#
#Estimated Parameter(s): mean of x = 3.9151948
# mean of y = 0.5985106
#
#Data: x = TcCB[Area == "Cleanup"]
# y = TcCB[Area == "Reference"]
#
#Sample Sizes: nx = 77
# ny = 47
#
#Test Statistic: mean.x - mean.y = 3.316684
#
#P-value: 0.0206
dev.new()
plot(mean.list)
#----------
# Now test for a difference in the medians.
#------------------------------------------
median.list <- with(EPA.94b.tccb.df,
twoSamplePermutationTestLocation(
TcCB[Area=="Cleanup"], TcCB[Area=="Reference"],
fcn = "median", alternative = "greater", seed = 47))
median.list
#Results of Hypothesis Test
#--------------------------
#
#Null Hypothesis: mu.x-mu.y = 0
#
#Alternative Hypothesis: True mu.x-mu.y is greater than 0
#
#Test Name: Two-Sample Permutation Test
# Based on Differences in Medians
# (Based on Sampling
# Permutation Distribution
# 5000 Times)
#
#Estimated Parameter(s): median of x = 0.43
# median of y = 0.54
#
#Data: x = TcCB[Area == "Cleanup"]
# y = TcCB[Area == "Reference"]
#
#Sample Sizes: nx = 77
# ny = 47
#
#Test Statistic: median.x - median.y = -0.11
#
#P-value: 0.936
dev.new()
plot(median.list)
#==========
# Clean up
#---------
rm(test.list, mean.list, median.list)
graphics.off()
Run the code above in your browser using DataLab