Hmisc (version 2.0-9)

ecdf: Empirical Cumulative Distribution Plot

Description

Computes coordinates of cumulative distribution function of x, and by defaults plots it as a step function. A grouping variable may be specified so that stratified estimates are computed and (by default) plotted. If there is more than one group, the labcurve function is used (by default) to label the multiple step functions or to draw a legend defining line types, colors, or symbols by linking them with group labels. A weights vector may be specified to get weighted estimates. Specify normwt to make weights sum to the length of x (after removing NAs). Other wise the total sample size is taken to be the sum of the weights.

ecdf is actually a method, and ecdf.default is what's called for a vector argument. ecdf.data.frame is called when the first argument is a data frame. This function can automatically set up a matrix of ECDFs and wait for a mouse click if the matrix requires more than one page. Categorical variables, character variables, and variables having fewer than a set number of unique values are ignored. If par(mfrow=..) is not set up before ecdf.data.frame is called, the function will try to figure the best layout depending on the number of variables in the data frame. Upon return the original mfrow is left intact.

When the first argument to ecdf is a formula, a Trellis/Lattice function ecdf.formula is called. This allows for multi-panel conditioning, superposition using a groups variable, and other Trellis features, along with the ability to easily plot transformed ECDFs using the fun argument. For example, if fun=qnorm, the inverse normal transformation will be used for the y-axis. If the transformed curves are linear this indicates normality. Like the xYplot function, ecdf will create a function Key if the groups variable is used. This function can be invoked by the user to define the keys for the groups.

Usage

ecdf(x, ...)

## S3 method for class 'default': ecdf(x, what=c('F','1-F','f'), weights, normwt=FALSE, xlab, ylab, q, pl=TRUE, add=FALSE, lty=1, col=1, group=rep(1,length(x)), label.curves=TRUE, xlim, subtitles=TRUE, datadensity=c('none','rug','hist','density'), side=1, frac=switch(datadensity,none=NA,rug=.03,hist=.1,density=.1), dens.opts=NULL, lwd, ...)

## S3 method for class 'data.frame': ecdf(x, group=rep(1,nrows), weights, normwt, label.curves=TRUE, n.unique=10, na.big=FALSE, subtitles=TRUE, vnames=c('labels','names'),...)

## S3 method for class 'formula': ecdf(x, data, groups, prepanel=prepanel.ecdf, panel=panel.ecdf, \dots, xlab, ylab, fun=function(x)x, subset=TRUE)

Arguments

x
a numeric vector, data frame, or Trellis/Lattice formula
what
The default is "F" which results in plotting the fraction of values <= x.="" set="" to="" "1-F" to plot the fraction > x or "f" to plot the cumulative frequency of values
weights
numeric vector of weights. Omit or specify a zero-length vector or NULL to get unweighted estimates.
normwt
see above
xlab
x-axis label. Default is label(x) or name of calling argument. For ecdf.formula, xlab defaults to the label attribute of the x-axis variable.
ylab
y-axis label. Default is "Proportion <= x"<="" code="">, "Proportion > x", or "Frequency <= x"="" depending="" on="" value="" of="" what.
q
a vector for quantiles for which to draw reference lines on the plot. Default is not to draw any.
pl
set to F to omit the plot, to just return estimates.
add
set toTRUE to add the cdf to an existing plot.
lty
integer line type for plot. If group is specified, this can be a vector.
lwd
line width for plot. Can be a vector corresponding to groups.
col
color for step function. Can be a vector.
group
a numeric, character, or factor categorical variable used for stratifying estimates. If group is present, as many ECDFs are drawn as there are non--missing group levels.
label.curves
applies if more than one group exists. Default is TRUE to use labcurve to label curves where they are farthest apart. Set label.curves to a list to specify options to labcurve,
xlim
x-axis limits. Default is entire range of x.
subtitles
set to FALSE to suppress putting a subtitle at the bottom left of each plot. The subtitle indicates the numbers of non-missing and missing observations, which are labeled n, m.
datadensity
If datadensity is not "none", either scat1d or histSpike is called to add a rug plot (datadensity="rug"), spike histogram (datadensity="hist"), or smooth density estimate (
side
If datadensity is not "none", the default is to place the additional information on top of the x-axis (side=1). Use side=3 to place at the top of the graph.
frac
passed to histSpike
dens.opts
a list of optional arguments for histSpike
...
other parameters passed to plot if add=F. For data frames, other parameters to pass to ecdf.default. For ecdf.formula, if groups is not used, you can also add data density information to each panel's ECDF by specify
n.unique
minimum number of unique values before an ECDF is drawn for a variable in a data frame. Default is 10.
na.big
set to TRUE to draw the number of NAs in larger letters in the middle of the plot for ecdf.data.frame
vnames
By default, variable labels are used to label x-axes. Set vnames="names" to instead use variable names.
method
method for computing the empirical cumulative distribution. See wtd.ecdf. The default is to use the standard "i/n" method as is used by the non-Trellis versions of ecdf.
fun
a function to transform the cumulative proportions, for the Trellis-type usage of ecdf
data
groups
subset
prepanel
panel
the usual Trellis/Lattice parameters, with groups causing ecdf.formula to overlay multiple ECDFs on one panel.

Value

  • for ecdf.default an invisible list with elements x and y giving the coordinates of the cdf. If there is more than one group, a list of such lists is returned. An attribute, N, is in the returned object. It contains the elements n and m, the number of non-missing and missing observations, respectively.

Side Effects

plots

concept

  • trellis
  • lattice

See Also

wtd.ecdf, label, table, cumsum, labcurve, xYplot, histSpike

Examples

Run this code
set.seed(1)
ch <- rnorm(1000, 200, 40)
ecdf(ch, xlab="Serum Cholesterol")
scat1d(ch)                       # add rug plot
histSpike(ch, add=TRUE, frac=.15)   # add spike histogram
# Better: add a data density display automatically:
ecdf(ch, datadensity='density')


label(ch) <- "Serum Cholesterol"
ecdf(ch)
other.ch <- rnorm(500, 220, 20)
ecdf(other.ch,add=TRUE,lty=2)


sex <- factor(sample(c('female','male'), 1000, TRUE))
ecdf(ch, q=c(.25,.5,.75))  # show quartiles
ecdf(ch, group=sex,
     label.curves=list(method='arrow'))


# Example showing how to draw multiple ECDFs from paired data
pre.test <- rnorm(100,50,10)
post.test <- rnorm(100,55,10)
x <- c(pre.test, post.test)
g <- c(rep('Pre',length(pre.test)),rep('Post',length(post.test)))
ecdf(x, group=g, xlab='Test Results', label.curves=list(keys=1:2))
# keys=1:2 causes symbols to be drawn periodically on top of curves


# Draw a matrix of ECDFs for a data frame
m <- data.frame(pre.test, post.test, 
                sex=sample(c('male','female'),100,TRUE))
ecdf(m, group=m$sex, datadensity='rug')


freqs <- sample(1:10, 1000, TRUE)
ecdf(ch, weights=freqs)  # weighted estimates


# Trellis/Lattice examples:


region <- factor(sample(c('Europe','USA','Australia'),100,TRUE))
year <- factor(sample(2001:2002,1000,TRUE))
ecdf(~ch | region*year, groups=sex)
Key()           # draw a key for sex at the default location
# Key(locator(1)) # user-specified positioning of key
age <- rnorm(1000, 50, 10)
ecdf(~ch | equal.count(age), groups=sex)  # use overlapping shingles
ecdf(~ch | sex, datadensity='hist', side=3)  # add spike histogram at top

Run the code above in your browser using DataCamp Workspace