# Ecdf

##### Empirical Cumulative Distribution Plot

Computes coordinates of cumulative distribution function of x, and by defaults
plots it as a step function. A grouping variable may be specified so that
stratified estimates are computed and (by default) plotted. If there is
more than one group, the `labcurve`

function is used (by default) to label
the multiple step functions or to draw a legend defining line types, colors,
or symbols by linking them with group labels. A `weights`

vector may
be specified to get weighted estimates. Specify `normwt`

to make
`weights`

sum to the length of `x`

(after removing NAs). Other wise
the total sample size is taken to be the sum of the weights.

`Ecdf`

is actually a method, and `Ecdf.default`

is what's
called for a vector argument. `Ecdf.data.frame`

is called when the
first argument is a data frame. This function can automatically set up
a matrix of ECDFs and wait for a mouse click if the matrix requires more
than one page. Categorical variables, character variables, and
variables having fewer than a set number of unique values are ignored.
If `par(mfrow=..)`

is not set up before `Ecdf.data.frame`

is
called, the function will try to figure the best layout depending on the
number of variables in the data frame. Upon return the original
`mfrow`

is left intact.

When the first argument to `Ecdf`

is a formula, a Trellis/Lattice function
`Ecdf.formula`

is called. This allows for multi-panel
conditioning, superposition using a `groups`

variable, and other
Trellis features, along with the ability to easily plot transformed
ECDFs using the `fun`

argument. For example, if `fun=qnorm`

,
the inverse normal transformation will be used for the y-axis. If the
transformed curves are linear this indicates normality. Like the
`xYplot`

function, `Ecdf`

will create a function `Key`

if
the `groups`

variable is used. This function can be invoked by the
user to define the keys for the groups.

- Keywords
- methods, hplot, distribution, nonparametric

##### Usage

`Ecdf(x, …)`# S3 method for default
Ecdf(x, what=c('F','1-F','f','1-f'),
weights=rep(1, length(x)), normwt=FALSE,
xlab, ylab, q, pl=TRUE, add=FALSE, lty=1,
col=1, group=rep(1,length(x)), label.curves=TRUE, xlim,
subtitles=TRUE, datadensity=c('none','rug','hist','density'),
side=1,
frac=switch(datadensity,none=NA,rug=.03,hist=.1,density=.1),
dens.opts=NULL, lwd=1, log='', …)

# S3 method for data.frame
Ecdf(x, group=rep(1,nrows),
weights=rep(1, nrows), normwt=FALSE,
label.curves=TRUE, n.unique=10, na.big=FALSE, subtitles=TRUE,
vnames=c('labels','names'),…)

# S3 method for formula
Ecdf(x, data=sys.frame(sys.parent()), groups=NULL,
prepanel=prepanel.Ecdf, panel=panel.Ecdf, …, xlab,
ylab, fun=function(x)x, what=c('F','1-F','f','1-f'), subset=TRUE)

##### Arguments

- x
a numeric vector, data frame, or Trellis/Lattice formula

- what
The default is

`"F"`

which results in plotting the fraction of values <= x. Set to`"1-F"`

to plot the fraction > x or`"f"`

to plot the cumulative frequency of values <= x. Use`"1-f"`

to plot the cumulative frequency of values >= x.- weights
numeric vector of weights. Omit or specify a zero-length vector or NULL to get unweighted estimates.

- normwt
see above

- xlab
x-axis label. Default is label(x) or name of calling argument. For

`Ecdf.formula`

,`xlab`

defaults to the`label`

attribute of the x-axis variable.- ylab
y-axis label. Default is

`"Proportion <= x"`

,`"Proportion > x"`

, or "Frequency <= x" depending on value of`what`

.- q
a vector for quantiles for which to draw reference lines on the plot. Default is not to draw any.

- pl
set to F to omit the plot, to just return estimates

- add
set to TRUE to add the cdf to an existing plot. Does not apply if using lattice graphics (i.e., if a formula is given as the first argument).

- lty
integer line type for plot. If

`group`

is specified, this can be a vector.- lwd
line width for plot. Can be a vector corresponding to

`group`

s.- log
see

`plot`

. Set`log='x'`

to use log scale for`x`

-axis.- col
color for step function. Can be a vector.

- group
a numeric, character, or

`factor`

categorical variable used for stratifying estimates. If`group`

is present, as many ECDFs are drawn as there are non--missing group levels.- label.curves
applies if more than one

`group`

exists. Default is`TRUE`

to use`labcurve`

to label curves where they are farthest apart. Set`label.curves`

to a`list`

to specify options to`labcurve`

, e.g.,`label.curves=list(method="arrow", cex=.8)`

. These option names may be abbreviated in the usual way arguments are abbreviated. Use for example`label.curves=list(keys=1:5)`

to draw symbols periodically (as in`pch=1:5`

- see`points`

) on the curves and automatically position a legend in the most empty part of the plot. Set`label.curves=FALSE`

to suppress drawing curve labels. The`col`

,`lty`

, and`type`

parameters are automatically passed to`labcurve`

, although you can override them here. You can set`label.curves=list(keys="lines")`

to have different line types defined in an automatically positioned key.- xlim
x-axis limits. Default is entire range of

`x`

.- subtitles
set to

`FALSE`

to suppress putting a subtitle at the bottom left of each plot. The subtitle indicates the numbers of non-missing and missing observations, which are labeled`n`

,`m`

.- datadensity
If

`datadensity`

is not`"none"`

, either`scat1d`

or`histSpike`

is called to add a rug plot (`datadensity="rug"`

), spike histogram (`datadensity="hist"`

), or smooth density estimate (`"density"`

) to the bottom or top of the ECDF.- side
If

`datadensity`

is not`"none"`

, the default is to place the additional information on top of the x-axis (`side=1`

). Use`side=3`

to place at the top of the graph.- frac
passed to

`histSpike`

- dens.opts
a list of optional arguments for

`histSpike`

- ...
other parameters passed to plot if add=F. For data frames, other parameters to pass to

`Ecdf.default`

. For`Ecdf.formula`

, if`groups`

is not used, you can also add data density information to each panel's ECDF by specifying the`datadensity`

and optional`frac`

,`side`

,`dens.opts`

arguments.- n.unique
minimum number of unique values before an ECDF is drawn for a variable in a data frame. Default is 10.

- na.big
set to

`TRUE`

to draw the number of NAs in larger letters in the middle of the plot for`Ecdf.data.frame`

- vnames
By default, variable labels are used to label x-axes. Set

`vnames="names"`

to instead use variable names.- method
method for computing the empirical cumulative distribution. See

`wtd.Ecdf`

. The default is to use the standard`"i/n"`

method as is used by the non-Trellis versions of`Ecdf`

.- fun
a function to transform the cumulative proportions, for the Trellis-type usage of

`Ecdf`

- data, groups, subset,prepanel, panel
the usual Trellis/Lattice parameters, with

`groups`

causing`Ecdf.formula`

to overlay multiple ECDFs on one panel.

##### Value

for `Ecdf.default`

an invisible list with elements x and y giving the
coordinates of the cdf. If there is more than one `group`

, a list of
such lists is returned. An attribute, `N`

, is in the returned
object. It contains the elements `n`

and `m`

, the number of
non-missing and missing observations, respectively.

##### Side Effects

plots

##### See Also

##### Examples

```
# NOT RUN {
set.seed(1)
ch <- rnorm(1000, 200, 40)
Ecdf(ch, xlab="Serum Cholesterol")
scat1d(ch) # add rug plot
histSpike(ch, add=TRUE, frac=.15) # add spike histogram
# Better: add a data density display automatically:
Ecdf(ch, datadensity='density')
label(ch) <- "Serum Cholesterol"
Ecdf(ch)
other.ch <- rnorm(500, 220, 20)
Ecdf(other.ch,add=TRUE,lty=2)
sex <- factor(sample(c('female','male'), 1000, TRUE))
Ecdf(ch, q=c(.25,.5,.75)) # show quartiles
Ecdf(ch, group=sex,
label.curves=list(method='arrow'))
# Example showing how to draw multiple ECDFs from paired data
pre.test <- rnorm(100,50,10)
post.test <- rnorm(100,55,10)
x <- c(pre.test, post.test)
g <- c(rep('Pre',length(pre.test)),rep('Post',length(post.test)))
Ecdf(x, group=g, xlab='Test Results', label.curves=list(keys=1:2))
# keys=1:2 causes symbols to be drawn periodically on top of curves
# Draw a matrix of ECDFs for a data frame
m <- data.frame(pre.test, post.test,
sex=sample(c('male','female'),100,TRUE))
Ecdf(m, group=m$sex, datadensity='rug')
freqs <- sample(1:10, 1000, TRUE)
Ecdf(ch, weights=freqs) # weighted estimates
# Trellis/Lattice examples:
region <- factor(sample(c('Europe','USA','Australia'),100,TRUE))
year <- factor(sample(2001:2002,1000,TRUE))
Ecdf(~ch | region*year, groups=sex)
Key() # draw a key for sex at the default location
# Key(locator(1)) # user-specified positioning of key
age <- rnorm(1000, 50, 10)
Ecdf(~ch | equal.count(age), groups=sex) # use overlapping shingles
Ecdf(~ch | sex, datadensity='hist', side=3) # add spike histogram at top
# }
```

*Documentation reproduced from package Hmisc, version 4.3-0, License: GPL (>= 2)*