Last chance! 50% off unlimited learning
Sale ends in
cdfCompareCensored(x, censored, censoring.side = "left",
y = NULL, y.censored = NULL, y.censoring.side = censoring.side,
discrete = FALSE, prob.method = "michael-schucany",
plot.pos.con = NULL, distribution = "norm", param.list = NULL,
estimate.params = is.null(param.list), est.arg.list = NULL,
x.col = "blue", y.or.fitted.col = "black", x.lwd = 3 * par("cex"),
y.or.fitted.lwd = 3 * par("cex"), x.lty = 1, y.or.fitted.lty = 2,
include.x.cen = FALSE, x.cen.pch = ifelse(censoring.side == "left", 6, 2),
x.cen.cex = par("cex"), x.cen.col = "red",
include.y.cen = FALSE, y.cen.pch = ifelse(y.censoring.side == "left", 6, 2),
y.cen.cex = par("cex"), y.cen.col = "black", digits = .Options$digits, ...,
type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL)
NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.x
are censored. This must be the
same length as x
. If the mode of censored
is "logical"
, TRUE
values
correspond to elements of "left"
(the default) and "right"
.x
).
Missing (NA
), undefined (NaN
), and infinite
(Inf
, -Inf
) values are allowed but will be removed.
The default value is y
are censored.
This must be the same length as y
. If the mode of censored
is
"logical"
, TRUE
values correspond to elements of y
. The possible values are "left"
(the default) and "right"
.
This argument is ignored when y
is not supplied. Thx
is discrete (discrete=TRUE
) or continuous (discrete=FALSE
; the default)."kaplan-meier"
(product-limit method of Kaplan and Meier (1958)),
"nelson"
(hazard plotty
is supplied, the default value is plot.pos.con=0.375
.
When y
is not supplied, for the normal, lognormal, three-py
is not supplied,
a character string denoting the distribution abbreviation. The default value is
distribution="norm"
. See the help file for Distribution.df
y
is not supplied,
a list with values for the parameters of the distribution. The default value is
param.list=list(mean=0, sd=1)
. See the help file for Distribution.df<
y
is not supplied,
a logical scalar indicating whether to compute the cdf for x
based on estimating the distribution parameters (estimate.params=TRUE
) or
using the known distribution parameters speciy
is not supplied and estimate.params=TRUE
,
a list whose components are optional arguments associated with the function used to
estimate the parameters of the assumed distribution (see the Section
Estimating Dx
) line or points. The default value is x.col="blue"
.
See the entry for col
in the help file for
y
) or the theoretical cdf line or points.
The default value is y.or.fitted.col="black"
. See the entry for
col
inx
) line.
The default value is x.lwd=3*par("cex")
.
See the entry for lwd
in the help file for par
y
)
or theoretical cdf line.
The default value is y.or.fitted.lwd=3*par("cex")
.
See the entry for lwd
in the help file for
x
) line. The default value is
x.lty=1
. See the entry for lty
in the help file for par
y
) or theoretical cdf line. The default value is
y.or.fitted.lty=2
.
See the entry for lty
in the help file for
x
in the
plot. The default value is include.x.cen=FALSE
. If include.x.cen=TRUE
,
censored values in x
are plotted using the plottix
. The default value is x.cen.pch=2
(hollow triangle pointing up) when x.censoring.side="right"
, and
x
. The default value is the current value of the
cex
graphics parameter. See the entry for cex
in the Rhelp fx
. The default value is
x.cen.col="red"
. See the entry for col
in the Rhelp file for
y
in the plot.
The default value is include.y.cen=FALSE
. If include.y.cen=TRUE
,
censored values in y
are plotted using the plottiy
. The default value is y.cen.pch=2
(hollow triangle pointing up) when y.censoring.side="right"
, and
y
. The default value is the current value of the
cex
graphics parameter. See the entry for cex
in the Rhelp y
. The default value is
y.cen.col="black"
. See the entry for col
in the Rhelp file for y
is not supplied,
a scalar indicating how many significant digits to print for the distribution
parameters. The default value is digits=.Options$digits
.y
is supplied, cdfCompareCensored
invisibly returns a list with
components x.ecdf.list
and y.ecdf.list
. Each of these components
is itself a list, with the components Order.Statistics
and
Cumulative.Probabilities
, giving coordinates of the points that have
been plotted.
When y
is not supplied, cdfCompareCensored
invisibly returns a list with
components x.ecdf.list
and fitted.cdf.list
.
The component x.ecdf.list
is itself a list with the components
Order.Statistics
and Cumulative.Probabilities
, giving coordinates of
the points that have been plotted for the x
values.
The component fitted.cdf.list
is itself a list with the components
Quantiles
and Cumulative.Probabilities
, giving coordinates of the
points that have been plotted for the fitted cdf.x
and y
are supplied, the function cdfCompareCensored
creates the empirical cdf plot of x
and y
on
the same plot by calling the function ecdfPlotCensored
.
When y
is not supplied, the function cdfCompareCensored
creates the
emprical cdf plot of x
(by calling ecdfPlotCensored
) and the
theoretical cdf plot (by calling cdfPlot
and using the
argument distribution
) on the same plot.cdfPlot
, ecdfPlotCensored
, qqPlotCensored
.# Generate 20 observations from a normal distribution with mean=20 and sd=5,
# censor all observations less than 18, then compare the empirical cdf with a
# theoretical normal cdf that is based on estimating the parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(333)
x <- sort(rnorm(20, mean=20, sd=5))
x
# [1] 9.743551 12.370197 14.375499 15.628482 15.883507 17.080124
# [7] 17.197588 18.097714 18.654182 19.585942 20.219308 20.268505
#[13] 20.552964 21.388695 21.763587 21.823639 23.168039 26.165269
#[19] 26.843362 29.673405
censored <- x < 18
x[censored] <- 18
sum(censored)
#[1] 7
dev.new()
cdfCompareCensored(x, censored)
# Clean up
#---------
rm(x, censored)
#==========
# Example 15-1 of USEPA (2009, page 15-10) gives an example of
# computing plotting positions based on censored manganese
# concentrations (ppb) in groundwater collected at 5 monitoring
# wells. The data for this example are stored in
# EPA.09.Ex.15.1.manganese.df. Here we will compare the empirical
# cdf based on Kaplan-Meier plotting positions or Michael-Schucany
# plotting positions with various assumed distributions
# (based on estimating the parameters of these distributions):
# 1) normal distribution
# 2) lognormal distribution
# 3) gamma distribution
# First look at the data:
#------------------------
EPA.09.Ex.15.1.manganese.df
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#4 4 Well.1 21.6 21.6 FALSE
#5 5 Well.1 <2 2.0 TRUE
#...
#21 1 Well.5 17.9 17.9 FALSE
#22 2 Well.5 22.7 22.7 FALSE
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
longToWide(EPA.09.Ex.15.1.manganese.df,
"Manganese.Orig.ppb", "Sample", "Well",
paste.row.name = TRUE)
# Well.1 Well.2 Well.3 Well.4 Well.5
#Sample.1 <5 <5 <5 6.3 17.9
#Sample.2 12.1 7.7 5.3 11.9 22.7
#Sample.3 16.9 53.6 12.6 10 3.3
#Sample.4 21.6 9.5 106.3 <2 8.4
#Sample.5 <2 45.9 34.5 77.2 <2
# Assume a normal distribution
#-----------------------------
# Michael-Schucany plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored))
# Kaplan-Meier plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored,
prob.method = "kaplan-meier"))
# Assume a lognormal distribution
#--------------------------------
# Michael-Schucany plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "lnorm"))
# Kaplan-Meier plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "lnorm",
prob.method = "kaplan-meier"))
# Assume a gamma distribution
#----------------------------
# Michael-Schucany plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "gamma"))
# Kaplan-Meier plotting positions:
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
cdfCompareCensored(Manganese.ppb, Censored, dist = "gamma",
prob.method = "kaplan-meier"))
# Clean up
#---------
graphics.off()
#==========
# Compare the distributions of copper and zinc between the Alluvial Fan Zone
# and the Basin-Trough Zone using the data of Millard and Deverel (1988).
# The data are stored in Millard.Deverel.88.df.
Millard.Deverel.88.df
# Cu.orig Cu Cu.censored Zn.orig Zn Zn.censored Zone Location
#1 < 1 1 TRUE <10 10 TRUE Alluvial.Fan 1
#2 < 1 1 TRUE 9 9 FALSE Alluvial.Fan 2
#3 3 3 FALSE NA NA FALSE Alluvial.Fan 3
#.
#.
#.
#116 5 5 FALSE 50 50 FALSE Basin.Trough 48
#117 14 14 FALSE 90 90 FALSE Basin.Trough 49
#118 4 4 FALSE 20 20 FALSE Basin.Trough 50
Cu.AF <- with(Millard.Deverel.88.df,
Cu[Zone == "Alluvial.Fan"])
Cu.AF.cen <- with(Millard.Deverel.88.df,
Cu.censored[Zone == "Alluvial.Fan"])
Cu.BT <- with(Millard.Deverel.88.df,
Cu[Zone == "Basin.Trough"])
Cu.BT.cen <- with(Millard.Deverel.88.df,
Cu.censored[Zone == "Basin.Trough"])
Zn.AF <- with(Millard.Deverel.88.df,
Zn[Zone == "Alluvial.Fan"])
Zn.AF.cen <- with(Millard.Deverel.88.df,
Zn.censored[Zone == "Alluvial.Fan"])
Zn.BT <- with(Millard.Deverel.88.df,
Zn[Zone == "Basin.Trough"])
Zn.BT.cen <- with(Millard.Deverel.88.df,
Zn.censored[Zone == "Basin.Trough"])
# First compare the copper concentrations
#----------------------------------------
dev.new()
cdfCompareCensored(x = Cu.AF, censored = Cu.AF.cen,
y = Cu.BT, y.censored = Cu.BT.cen)
# Now compare the zinc concentrations
#------------------------------------
dev.new()
cdfCompareCensored(x = Zn.AF, censored = Zn.AF.cen,
y = Zn.BT, y.censored = Zn.BT.cen)
# Compare the Zinc concentrations again, but delete
# the one "outlier".
#--------------------------------------------------
summaryStats(Zn.AF)
# N Mean SD Median Min Max NA's N.Total
#Zn.AF 67 23.5075 74.4192 10 3 620 1 68
summaryStats(Zn.BT)
# N Mean SD Median Min Max
#Zn.BT 50 21.94 18.7044 18.5 3 90
which(Zn.AF == 620)
#[1] 38
summaryStats(Zn.AF[-38])
# N Mean SD Median Min Max NA's N.Total
#Zn.AF[-38] 66 14.4697 8.1604 10 3 50 1 67
dev.new()
cdfCompareCensored(x = Zn.AF[-38], censored = Zn.AF.cen[-38],
y = Zn.BT, y.censored = Zn.BT.cen)
#----------
# Clean up
#---------
rm(Cu.AF, Cu.AF.cen, Cu.BT, Cu.BT.cen,
Zn.AF, Zn.AF.cen, Zn.BT, Zn.BT.cen)
graphics.off()
Run the code above in your browser using DataLab