cdfCompare: Plot Two Cumulative Distribution Functions

Description

For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.

Usage

cdfCompare(x, y = NULL, discrete = FALSE, 
    prob.method = ifelse(discrete, "emp.probs", "plot.pos"), 
    plot.pos.con = NULL, distribution = "norm", param.list = NULL, 
    estimate.params = is.null(param.list), est.arg.list = NULL, x.col = "blue", 
    y.or.fitted.col = "black", x.lwd = 3 * par("cex"), y.or.fitted.lwd = 3 * par("cex"), 
    x.lty = 1, y.or.fitted.lty = 2, digits = .Options$digits, ..., 
    type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL, 
    xlim = NULL, ylim = NULL)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

a numeric vector (not necessarily of the same length as x). Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed. The default value is

discrete

logical scalar indicating whether the assumed parent distribution of x is discrete (discrete=TRUE) or continuous (discrete=FALSE; the default).

prob.method

character string indicating what method to use to compute the plotting positions (empirical probabilities). Possible values are plot.pos (plotting positions, the default if discrete=FALSE) and emp.probs

plot.pos.con

numeric scalar between 0 and 1 containing the value of the plotting position constant. When y is supplied, the default value is plot.pos.con=0.375. When y is not supplied, for the normal, lognormal, three-p

distribution

when y is not supplied, a character string denoting the distribution abbreviation. The default value is distribution="norm". See the help file for Distribution.df

param.list

when y is not supplied, a list with values for the parameters of the distribution. The default value is param.list=list(mean=0, sd=1). See the help file for Distribution.df<

estimate.params

when y is not supplied, a logical scalar indicating whether to compute the cdf for x based on estimating the distribution parameters (estimate.params=TRUE) or using the known distribution parameters speci

est.arg.list

when y is not supplied and estimate.params=TRUE, a list whose components are optional arguments associated with the function used to estimate the parameters of the assumed distribution (see the help file

x.col

a numeric scalar or character string determining the color of the empirical cdf (based on x) line or points. The default value is x.col="blue". See the entry for col in the help file for

y.or.fitted.col

a numeric scalar or character string determining the color of the empirical cdf (based on y) or the theoretical cdf line or points. The default value is y.or.fitted.col="black". See the entry for col in

x.lwd

a numeric scalar determining the width of the empirical cdf (based on x) line. The default value is x.lwd=3*par("cex"). See the entry for lwd in the help file for par

y.or.fitted.lwd

a numeric scalar determining the width of the empirical cdf (based on y) 
  or theoretical cdf line.  
  The default value is y.or.fitted.lwd=3*par("cex").  
  See the entry for lwd in the help file for

x.lty

a numeric scalar determining the line type of the empirical cdf 
  (based on x) line.  The default value is 
  x.lty=1.  See the entry for lty in the help file for par

y.or.fitted.lty

a numeric scalar determining the line type of the empirical cdf 
  (based on y) or theoretical cdf line.  The default value is 
  y.or.fitted.lty=2.  
  See the entry for lty in the help file for

digits

when y is not supplied, 
  a scalar indicating how many significant digits to print for the distribution 
  parameters.  The default value is digits=.Options$digits.

type, main, xlab, ylab, xlim, ylim, ...

additional graphical parameters (see lines and par).  
  In particular, the argument type specifies the kind of line type.  
  By default, the funct

`Value`

When y is supplied, cdfCompare invisibly returns a list with 
  components x.ecdf.list and y.ecdf.list.  Each of these components 
  is itself a list, with the components Order.Statistics and 
  Cumulative.Probabilities, giving coordinates of the points that have 
  been plotted.

  When y is not supplied, cdfCompare invisibly returns a list with 
  components x.ecdf.list and fitted.cdf.list.  
  The component x.ecdf.list is itself a list with the components 
  Order.Statistics and Cumulative.Probabilities, giving coordinates of 
  the points that have been plotted for the x values.  
  The component fitted.cdf.list is itself a list with the components 
  Quantiles and Cumulative.Probabilities, giving coordinates of the 
  points that have been plotted for the fitted cdf.

`Details`

When both x and y are supplied, the function cdfCompare 
  creates the empirical cdf plot of x and y on 
  the same plot by calling the function ecdfPlot.

  When y is not supplied, the function cdfCompare creates the 
  emprical cdf plot of x (by calling ecdfPlot) and the 
  theoretical cdf plot (by calling cdfPlot and using the 
  argument distribution) on the same plot.

`References`

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). 
  Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.

  Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.

  D'Agostino, R.B. (1986a). Graphical Analysis. 
  In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. 
  Marcel Dekker, New York, Chapter 2, pp.7-62.

`See Also`

cdfPlot, ecdfPlot, qqPlot.

`Examples`

Run this code# Generate 20 observations from a normal (Gaussian) distribution 
  # with mean=10 and sd=2 and compare the empirical cdf with a 
  # theoretical normal cdf that is based on estimating the parameters. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  x <- rnorm(20, mean = 10, sd = 2) 
  dev.new()
  cdfCompare(x)

  #----------

  # Generate 30 observations from an exponential distribution with parameter 
  # rate=0.1 (see the R help file for Exponential) and compare the empirical 
  # cdf with the empirical cdf of the normal observations generated in the 
  # previous example:

  set.seed(432)
  y <- rexp(30, rate = 0.1) 
  dev.new()
  cdfCompare(x, y)

  #==========

  # Generate 20 observations from a Poisson distribution with parameter lambda=10 
  # (see the R help file for Poisson) and compare the empirical cdf with a 
  # theoretical Poisson cdf based on estimating the distribution parameters. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(250) 
  x <- rpois(20, lambda = 10) 
  dev.new()
  cdfCompare(x, dist = "pois")

  #==========

  # Clean up
  #---------
  rm(x, y)
  graphics.off()
Run the code above in your browser using DataLab