gofGroupTest: Goodness-of-Fit Test for a Specified Probability Distribution for Groups

Description

Perform a goodness-of-fit test to determine whether data in a set of groups appear to all come from the same probability distribution (with possibly different parameters for each group).

Usage

gofGroupTest(object, ...)

## S3 method for class 'formula':
gofGroupTest(object, data = NULL, subset, 
  na.action = na.pass, ...)

## S3 method for class 'default':
gofGroupTest(object, group, test = "sw", 
  distribution = "norm", est.arg.list = NULL, n.classes = NULL, 
  cut.points = NULL, param.list = NULL, 
  estimate.params = ifelse(is.null(param.list), TRUE, FALSE), 
  n.param.est = NULL, correct = NULL, digits = .Options$digits, 
  exact = NULL, ws.method = "normal scores", 
  data.name = NULL, group.name = NULL, parent.of.data = NULL, 
  subset.expression = NULL, ...)

## S3 method for class 'data.frame':
gofGroupTest(object, ...)

## S3 method for class 'matrix':
gofGroupTest(object, ...)

## S3 method for class 'list':
gofGroupTest(object, ...)

Arguments

object

an object containing data for 2 or more groups to be compared to the hypothesized distribution specified by distribution. In the default method, the argument object must be a numeric vector. When object

data

when object is a formula, data specifies an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data

subset

when object is a formula, subset specifies an optional vector specifying 
  a subset of observations to be used.

na.action

when object is a formula, na.action specifies a function which indicates 
  what should happen when the data contain NAs. The default is na.pass.

group

when object is a numeric vector, group is a factor or character vector 
  indicating which group each observation belongs to.  When object is a matrix or data frame
  this argument is ignored and the columns define

test

character string defining which goodness-of-fit test to perform on each group.  
  Possible values are:  
  "sw" (Shapiro-Wilk; the default), "sf" (Shapiro-Francia), 
  "ppcc" (Probability Plot Correlation Coeffic

distribution

a character string denoting the distribution abbreviation.  See the help file for 
  Distribution.df for a list of distributions and their abbreviations.  
  The default value is distributio

est.arg.list

a list of arguments to be passed to the function estimating the distribution parameters 
  for each group of observations.  
  For example, if test="sw" and 
distribution="gamma", setting est.arg.list=list(method="bcmle")

n.classes

for the case when test="chisq", the number of cells into which the observations 
  within each group are to be allocated.  If the argument cut.points is supplied, 
  then n.classes is set to length(cut.points

cut.points

for the case when test="chisq", a vector of cutpoints that defines the cells for each 
  group of observations. 
  The element x[i] is allocated to cell j if 
cut.points[j] < x[i] $\le$

param.list

for the case when test="ks" or test="chisq", 
  a list with values for the parameters of the specified distribution.  See the help file 
  for Distribution.df for the nam

estimate.params

for the case when test="ks" or test="chisq", 
  a logical scalar indicating whether to perform the goodness-of-fit test based on 
  estimating the distribution parameters (estimate.params=TRUE) or using the 
  use

n.param.est

for the case when test="ks" or test="chisq", 
  an integer indicating the number of parameters estimated from the data.  
If estimate.params=TRUE, the default value is the number of parameters associated 
  with th

correct

for the case when test="chisq", a logical scalar indicating whether to use the 
  continuity correction.  The default value is correct=FALSE unless 
n.classes=2.

digits

a scalar indicating how many significant digits to print out for the parameters 
  associated with the hypothesized distribution.  The default value is 
.Options$digits.

exact

for the case when test="ks", exact=NULL by default, but can be set to 
  a logical scalar indicating whether an exact p-value should be computed.  
  See the help file for ks.test

ws.method

character string indicating which method to use when performing the 
  Wilk-Shapiro test for a Uniform [0,1] distribution 
  on the p-values from the goodness-of-fit tests on each group

data.name

character string indicating the name of the data used for the goodness-of-fit tests.  
  The default value is data.name=deparse(substitute(object)).

group.name

character string indicating the name of the data used to create the groups.
  The default value is group.name=deparse(substitute(group)).

parent.of.data

character string indicating the source of the data used for the goodness-of-fit tests.

subset.expression

character string indicating the expression used to subset the data.

...

additional arguments affecting the goodness-of-fit test.

`Value`

a list of class "gofGroup" containing the results of the group goodness-of-fit test.  
  Objects of class "gofGroup" have special printing and plotting methods.  
  See the help file for gofGroup.object for details.

`Details`

The function gofGroupTest performs a goodness-of-fit test for each group of 
  data by calling the function gofTest.  Using the p-values from these 
  goodness-of-fit tests, it then calls the function gofTest with the 
  argument test="ws" to test whether the p-values appear to come from a 
  Uniform [0,1] distribution.

`References`

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). 
  Statistical Methods for Groundwater Monitoring, Second Edition.  
  John Wiley & Sons, Hoboken.

  USEPA. (2009).  Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance.
  EPA 530/R-09-007, March 2009.  Office of Resource Conservation and Recovery Program Implementation and Information Division.  
  U.S. Environmental Protection Agency, Washington, D.C. p.17-17.

  USEPA. (2010).  Errata Sheet - March 2009 Unified Guidance.
  EPA 530/R-09-007a, August 9, 2010.  Office of Resource Conservation and Recovery, Program Information and Implementation Division.
  U.S. Environmental Protection Agency, Washington, D.C.

  Wilk, M.B., and S.S. Shapiro. (1968). The Joint Assessment of Normality of Several Independent 
  Samples. Technometrics, 10(4), 825-839.

`See Also`

gofTest, gofGroup.object, print.gofGroup, 
  plot.gofGroup, qqPlot.

`Examples`

Run this code# Example 10-4 of USEPA (2009, page 10-20) gives an example of 
  # simultaneously testing the assumption of normality for nickel 
  # concentrations (ppb) in groundwater collected at 4 monitoring 
  # wells over 5 months.  The data for this example are stored in 
  # EPA.09.Ex.10.1.nickel.df.

  EPA.09.Ex.10.1.nickel.df
  #   Month   Well Nickel.ppb
  #1      1 Well.1       58.8
  #2      3 Well.1        1.0
  #3      6 Well.1      262.0
  #4      8 Well.1       56.0
  #5     10 Well.1        8.7
  #6      1 Well.2       19.0
  #7      3 Well.2       81.5
  #8      6 Well.2      331.0
  #9      8 Well.2       14.0
  #10    10 Well.2       64.4
  #11     1 Well.3       39.0
  #12     3 Well.3      151.0
  #13     6 Well.3       27.0
  #14     8 Well.3       21.4
  #15    10 Well.3      578.0
  #16     1 Well.4        3.1
  #17     3 Well.4      942.0
  #18     6 Well.4       85.6
  #19     8 Well.4       10.0
  #20    10 Well.4      637.0


  # Test for a normal distribution at each well:
  #--------------------------------------------

  gofGroup.list <- gofGroupTest(Nickel.ppb ~ Well, 
    data = EPA.09.Ex.10.1.nickel.df)

  gofGroup.list

  #Results of Group Goodness-of-Fit Test
  #-------------------------------------
  #
  #Test Method:                     Wilk-Shapiro GOF (Normal Scores)
  #
  #Hypothesized Distribution:       Normal
  #
  #Data:                            Nickel.ppb
  #
  #Grouping Variable:               Well
  #
  #Data Source:                     EPA.09.Ex.10.1.nickel.df
  #
  #Number of Groups:                4
  #
  #Sample Sizes:                    Well.1 = 5
  #                                 Well.2 = 5
  #                                 Well.3 = 5
  #                                 Well.4 = 5
  #
  #Test Statistic:                  z (G) = -3.658696
  #
  #P-values for
  #Individual Tests:                Well.1 = 0.03510747
  #                                 Well.2 = 0.02385344
  #                                 Well.3 = 0.01120775
  #                                 Well.4 = 0.10681461
  #
  #P-value for
  #Group Test:                      0.0001267509
  #
  #Alternative Hypothesis:          At least one group
  #                                 does not come from a
  #                                 Normal Distribution.

  dev.new()
  plot(gofGroup.list)

  #----------

  # Test for a lognormal distribution at each well:
  #-----------------------------------------------

  gofGroupTest(Nickel.ppb ~ Well, data = EPA.09.Ex.10.1.nickel.df, 
    dist = "lnorm")

  #Results of Group Goodness-of-Fit Test
  #-------------------------------------
  #
  #Test Method:                     Wilk-Shapiro GOF (Normal Scores)
  #
  #Hypothesized Distribution:       Lognormal
  #
  #Data:                            Nickel.ppb
  #
  #Grouping Variable:               Well
  #
  #Data Source:                     EPA.09.Ex.10.1.nickel.df
  #
  #Number of Groups:                4
  #
  #Sample Sizes:                    Well.1 = 5
  #                                 Well.2 = 5
  #                                 Well.3 = 5
  #                                 Well.4 = 5
  #
  #Test Statistic:                  z (G) = 0.2401720
  #
  #P-values for
  #Individual Tests:                Well.1 = 0.6898164
  #                                 Well.2 = 0.6700394
  #                                 Well.3 = 0.3208299
  #                                 Well.4 = 0.5041375
  #
  #P-value for
  #Group Test:                      0.5949015
  #
  #Alternative Hypothesis:          At least one group
  #                                 does not come from a
  #                                 Lognormal Distribution.

  #----------
  # Clean up
  rm(gofGroup.list)
  graphics.off()
Run the code above in your browser using DataLab