DescTools (version 0.99.13)

PlotFdist: Frequency Distribution Plot

Description

This function is designed to give a univariate graphic representation of a numeric vector's frequency distribution. It combines a histogram, a density curve, a boxplot and a plot of the empirical cumulative distribution function (ecdf) in one single plot, resulting in a dense and informative picture of the facts. Still the function remains flexible as all possible arguments can be passed to the single components (hist, boxplot etc.) by list (see examples).

Usage

PlotFdist(x, main = deparse(substitute(x)), xlab = "", xlim = NULL,
          do.hist = !(all(IsWhole(x, na.rm = TRUE)) & length(unique(na.omit(x))) < 13),
          args.hist = NULL, args.rug = NA, args.dens = NULL, args.curve = NA, 
          args.boxplot = NULL, args.ecdf = NULL, heights = NULL, pdist = NULL,
          na.rm = FALSE, cex.axis = NULL, cex.main = NULL, mar = NULL)

Arguments

x
the numerical variable, whose distribution is to be plotted.
main
main title of the plot.
xlab
label of the x-axis, defaults to "". (The name of the variable is typically placed in the main title and would be redundant.)
xlim
range of the x-axis, defaults to a pretty range(x, na.rm = TRUE).
do.hist
defines, whether a histogram or a plot with type = "h" should be used. Default is TRUE (meaning a histogram will be plotted), unless x is an integer with less than 13 unique values!
args.hist
list of additional arguments to be passed to the histogram hist(), ignored if do.hist = FALSE. The defaults chosen when setting args.hist = NULL are more or less the same as in h
args.rug
list of additional arguments to be passed to the function rug(). Use args.rug = NA if no rug should be added. This is the default. Use args.rug = NULL to add rug with reasonable default values.
args.dens
list of additional arguments to be passed to density. Use args.dens = NA if no density curve should be drawn. The defaults are taken from density.
args.curve
list of additional arguments to be passed to curve. This argument allows to add a fitted distribution curve to the histogram. By default no curve will be added (args.curve = NA). If the argument is set to
args.boxplot
list of additional arguments to be passed to the boxplot boxplot(). The defaults are pretty much the same as in boxplot.
args.ecdf
list of additional arguments to be passed to ecdf(). Use args.ecdf = NA if no empirical cumulation function should be included in the plot. The defaults are taken from plot.ecdf
heights
heights of the plotparts, defaults to c(2,0.5,1.4) for the histogram, the boxplot and the empirical cumulative distribution function, resp. to c(2,1.5) for a histogram and a boxplot only.
pdist
distances of the plotparts, defaults to c(0, 0), say there will be no distance between the histogram, the boxplot and the ecdf-plot. This can be changed for instance in case that the xaxis is to be added to the histogram.
na.rm
logical, should NAs be omitted? Histogram and boxplot could do without this option, but the density-function refuses to plot with missings. Defaults to FALSE.
cex.axis
character extension factor for the axes.
cex.main
character extension factor for the main title. Must be set in dependence of the plot parts in order to get a harmonic view.
mar
A numerical vector of the form c(bottom, left, top, right) which gives the number of lines of outer margin to be specified on the four sides of the plot. The default is c(0, 0, 3, 0).

Details

If x is growing large (n > 1e5), the performance will suffer. Especially the density curve and the ecdf, but as well the boxplot (due to the chosen alpha channel) will take their time to calculate and plot. In such cases consider taking a sample, i.e. PlotFdist(x[sample(length(x), size=5000)]), the big picture of the distribution won't usually change much. .

See Also

hist, boxplot, ecdf, density, rug, layout

Examples

Run this code
# create a new window and do the plot 
PlotFdist(x=d.pizza$delivery_min, na.rm=TRUE)

# define additional arguments for hist and dens
PlotFdist(d.pizza$delivery_min, args.hist=list(breaks=50), 
  args.dens=list(col="olivedrab4"), na.rm=TRUE )


# do a "h"-plot instead of a histogram for integers 
PlotFdist(d.pizza$weekday, na.rm=TRUE)

# special arguments for hist, density and ecdf
PlotFdist(x=faithful$eruptions, 
          args.hist=list(breaks=20), args.dens=list(bw=.1),
          args.ecdf=list(verticals=FALSE, do.points=TRUE, 
            cex=1.2, pch=16, lwd=1), args.rug=TRUE)

# no density curve, no ecdf but add rug instead, make boxplot a bit higher
PlotFdist(x=d.pizza$delivery_min, na.rm=TRUE, args.dens=NA, args.ecdf=NA, 
  args.hist=list(xaxt="s"),  # display x-axis on the histogram
  args.rug=TRUE, heights=c(3, 2.5), pdist=2.5, main="Delivery time")

# alpha channel on rug is cool, but takes its time for being drawn...
PlotFdist(x=d.pizza$temperature, args.rug=list(col=SetAlpha("black", 0.1)), na.rm=TRUE)

# plot a normal density curve
x <- rnorm(1000) 
PlotFdist(x, args.curve = NULL, args.boxplot=NA, args.ecdf=NA)

# compare with a t-distribution
PlotFdist(x, args.curve = list(expr="dt(x, df=2)", col="darkgreen"), 
          args.boxplot=NA, args.ecdf=NA)
legend(x="topright", legend=c("kernel density", "t-distribution (df=2)"), 
       fill=c(getOption("col1", hred), "darkgreen"))

Run the code above in your browser using DataCamp Workspace