multiecdf: Multiple empirical cumulative distribution functions (ecdf) and densities

Description

Plot multiple empirical cumulative distribution functions (ecdf) and densities with a user interface similar to that of boxplot. The usefulness of multidensity is variable, depending on the data and the smoothing kernel. multiecdf will in many cases be preferable. Please see Details.

Usage

multiecdf(x, ...)
"multiecdf"(formula, data = NULL, xlab, na.action = NULL, ...)
"multiecdf"(x, xlab, ...) 
"multiecdf"(x, xlim, col = brewer.pal(9, "Set1"), main = "ecdf", xlab, do.points = FALSE, subsample = 1000L, legend = list( x = "right", legend = if(is.null(names(x))) paste(seq(along=x)) else names(x), fill = col), ...)
multidensity(x, ...)
"multidensity"(formula, data = NULL, xlab, na.action = NULL, ...)
"multidensity"(x, xlab, ...) 
"multidensity"(x, bw = "nrd0", xlim, ylim, col  = brewer.pal(9, "Set1"), main = if(length(x)==1) "density" else "densities", xlab, lty  = 1L, legend = list( x = "topright", legend = if(is.null(names(x))) paste(seq(along=x)) else names(x), fill = col), density = NULL, ...)

Arguments

formula

a formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor).

data

a data.frame (or list) from which the variables in formula should be taken.

na.action

a function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group.

methods exist for: formula, matrix, data.frame, list of numeric vectors.

the smoothing bandwidth, see the manual page for density. The length of bw needs to be either 1 (in which case the same is used for all groups) or the same as the number of groups in x (in which case the corresponding value of bw is used for each group).

xlim

Range of the x axis. If missing, the data range is used.

ylim

Range of the y axis. If missing, the range of the density estimates is used.

col, lty

Line colors and line type.

main

Plot title.

xlab

x-axis label.

do.points

logical; if TRUE, also draw points at the knot locations.

subsample

numeric or logical of length 1. If numeric, and larger than 0, subsamples of that size are used to compute and plot the ecdf for those elements of x with more than that number of observations. If logical and TRUE, a value of 1000 is used for the subsample size.

legend

a list of arguments that is passed to the function legend.

density

a list of arguments that is passed to the function density.

...

Further arguments that get passed to the plot functions.

Value

multidensity functions, a list of density objects.

Details

Density estimates: multidensity uses the function density. If the density of the data-generating process is smooth on the real axis, then the output from this function tends to produce results that are good approximations of the true density. If, however, the true density has steps (this is in particular the case for quantities such as p-values and correlation coefficients, or for some distributions that have weight only on the posititve numbers, or only on integer numbers), then the output of this function tends to be misleading. In that case, please either use multiecdf or histograms, or try to improve the density estimate by setting the density argument (from, to, kernel).

Bandwidths: the choice of the smoothing bandwidths in multidensity can be problematic, in particular, if the different groups vary with respect to range and/or number of data points. If curves look excessively wiggly or overly smooth, try varying the arguments xlim and bw; note that the argument bw can be a vector, in which case it is expect to align with the groups.

Examples

Run this code

  words = strsplit(packageDescription("geneplotter")$Description, " ")[[1]]
  factr = factor(sample(words, 2000, replace = TRUE))
  x = rnorm(length(factr), mean=as.integer(factr))
  
  multiecdf(x ~ factr)
  multidensity(x ~ factr)

Run the code above in your browser using DataLab