ecdfPlotCensored: Empirical Cumulative Distribution Function Plot Based on Type I Censored Data

Description

Produce an empirical cumulative distribution function plot for Type I left-censored or right-censored data.

Usage

ecdfPlotCensored(x, censored, censoring.side = "left", discrete = FALSE, 
    prob.method = "michael-schucany", plot.pos.con = 0.375, plot.it = TRUE, 
    add = FALSE, ecdf.col = 1, ecdf.lwd = 3 * par("cex"), ecdf.lty = 1, 
    include.cen = FALSE, cen.pch = ifelse(censoring.side == "left", 6, 2), 
    cen.cex = par("cex"), cen.col = 4, ..., 
    type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL, 
    xlim = NULL, ylim = NULL)

Arguments

numeric vector of observations. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

censored

numeric or logical vector indicating which values of x are censored. This must be the same length as x. If the mode of censored is "logical", TRUE values correspond to elements of

censoring.side

character string indicating on which side the censoring occurs. The possible values are "left" (the default) and "right".

discrete

logical scalar indicating whether the assumed parent distribution of x is discrete (discrete=TRUE) or continuous (discrete=FALSE; the default).

prob.method

character string indicating what method to use to compute the plotting positions (empirical probabilities). Possible values are "kaplan-meier" (product-limit method of Kaplan and Meier (1958)), "nelson" (hazard plotting

plot.pos.con

numeric scalar between 0 and 1 containing the value of the plotting position constant. The default value is plot.pos.con=0.375. See the DETAILS section for more information. This argument is used only if prob.method is

plot.it

logical scalar indicating whether to produce a plot or add to the current plot (see add) on the current graphics device. The default value is plot.it=TRUE.

add

logical scalar indicating whether to add the empirical cdf to the current plot (add=TRUE) or generate a new plot (add=FALSE; the default). This argument is ignored if plot.it=FALSE.

ecdf.col

a numeric scalar or character string determining the color of the empirical cdf line or points. The default value is ecdf.col=1. See the entry for col in the help file for par

ecdf.lwd

a numeric scalar determining the width of the empirical cdf line. The default value is ecdf.lwd=3*par("cex"). See the entry for lwd in the help file for par for more informat

ecdf.lty

a numeric scalar determining the line type of the empirical cdf line. The default value is ecdf.lty=1. See the entry for lty in the help file for par for more information.

include.cen

logical scalar indicating whether to include censored values in the plot. The default value is include.cen=FALSE. If include.cen=TRUE, censored values are plotted using the plotting character indicated by the argument <

cen.pch

numeric scalar or character string indicating the plotting character to use to plot censored values. The default value is cen.pch=2 (hollow triangle pointing up) when censoring.side="right", and cen.pch=6 (h

cen.cex

numeric scalar that determines the size of the plotting character used to plot censored values. The default value is the current value of the cex graphics parameter. See the entry for cex in the help file for

cen.col

numeric scalar or character string that determines the color of the plotting character used to plot censored values. The default value is cen.col=4. See the entry for col in the help file for

type, main, xlab, ylab, xlim, ylim, ...

additional graphical parameters (see lines and par). In particular, the argument type specifies the kind of line type. By default, the function

Value

ecdfPlotCensored returns a list with the following components:
Order.Statisticsnumeric vector of the ordered observations.
Cumulative.Probabilitiesnumeric vector of the associated plotting positions.
Censoredlogical vector indicating which of the ordered observations are censored.
Censoring.Sidecharacter string indicating whether the data are left- or right-censored. This is same value as the argument censoring.side.
Prob.Methodcharacter string indicating what method was used to compute the plotting positions. This is the same value as the argument prob.method.
Optional Component (only present when prob.method="michael-schucany" or prob.method="hirsch-stedinger"):
Plot.Pos.Connumeric scalar containing the value of the plotting position constant that was used. This is the same as the argument plot.pos.con.

Details

The function ecdfPlotCensored does exactly the same thing as ecdfPlot, except it calls the function ppointsCensored to compute the plotting positions (estimated cumulative probabilities) for the uncensored observations. If plot.it=TRUE, the estimated cumulative probabilities for the uncensored observations are plotted against the uncensored observations. By default, the function ecdfPlotCensored plots a step function when discrete=TRUE, and plots a straight line between points when discrete=FALSE. The user may override these defaults by supplying the graphics parameter type (type="s" for a step function, type="l" for linear interpolation, type="p" for points only, etc.). If include.cen=TRUE, censored observations are included on the plot as points. The arguments cen.pch, cen.cex, and cen.col control the appearance of these points. In cases where x is a random sample, the emprical cdf will change from sample to sample and the variability in these estimates can be dramatic for small sample sizes. Caution must be used in interpreting the empirical cdf when a large percentage of the observations are censored.

References

Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16. Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp. D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62. Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse Kaplan-Meier Estimator. Epidemiology 21(4), S64--S70. Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey. Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997-2004. Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715-727. Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457-481. Lee, E.T., and J.W. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley & Sons, Hoboken, New Jersey, 513pp. Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461-496. Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945-966. USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15. USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Examples

Run this code

# Generate 20 observations from a normal distribution with mean=20 and sd=5, 
  # censor all observations less than 18, then generate an empirical cdf plot  
  # for the complete data set and the censored data set.  Note that the empirical 
  # cdf plot for the censored data set starts at the first ordered uncensored 
  # observation, and that for values of x > 18 the two emprical cdf plots are 
  # exactly the same.  This is because there is only one censoring level and 
  # no uncensored observations fall below the censored observations. 
  # (Note: the call to set.seed simply allows you to reproduce this example.)

  set.seed(333) 
  x <- rnorm(20, mean=20, sd=5) 
  censored <- x < 18

  sum(censored) 
  #[1] 7 

  new.x <- x 
  new.x[censored] <- 18

  dev.new()
  ecdfPlot(x, xlim = range(pretty(x)), 
    main = "Empirical CDF Plot for
Complete Data Set") 

  dev.new()
  ecdfPlotCensored(new.x, censored, xlim = range(pretty(x)), 
    main="Empirical CDF Plot for
Censored Data Set")

  # Clean up
  #---------
  rm(x, censored, new.x)

  #------------------------------------------------------------------------------------

  # Example 15-1 of USEPA (2009, page 15-10) gives an example of
  # computing plotting positions based on censored manganese 
  # concentrations (ppb) in groundwater collected at 5 monitoring
  # wells.  The data for this example are stored in 
  # EPA.09.Ex.15.1.manganese.df.  Here we will create an empirical 
  # CDF plot based on the Kaplan-Meier method.

  EPA.09.Ex.15.1.manganese.df
  #   Sample   Well Manganese.Orig.ppb Manganese.ppb Censored
  #1       1 Well.1                 <5           5.0     TRUE
  #2       2 Well.1               12.1          12.1    FALSE
  #3       3 Well.1               16.9          16.9    FALSE
  #4       4 Well.1               21.6          21.6    FALSE
  #5       5 Well.1                 <2           2.0     TRUE
  #...
  #21      1 Well.5               17.9          17.9    FALSE
  #22      2 Well.5               22.7          22.7    FALSE
  #23      3 Well.5                3.3           3.3    FALSE
  #24      4 Well.5                8.4           8.4    FALSE
  #25      5 Well.5                 <2           2.0     TRUE

  dev.new()
  with(EPA.09.Ex.15.1.manganese.df, 
    ecdfPlotCensored(Manganese.ppb, Censored, 
      prob.method = "kaplan-meier", ecdf.col = "blue", 
      main = "Empirical CDF of Manganese Data
Based on Kaplan-Meier"))

  #==========

  # Clean up
  #---------
  graphics.off()

Run the code above in your browser using DataLab

Data Engineering and BI courses are free this week!