Learn R Programming

lessR (version 1.9.8)

color.hist: Histogram with Color

Description

Accessing the standard R function hist, plots a frequency histogram with default colors, including background color and gridlines plus an option for a relative frequency and/or cumulative histogram, as well as summary statistics and a table that provides the bins, midpoints, counts, proportions, cumulative counts and cumulative proportions. Bins can be selected several different ways besides the default, including specifying just the bin width. Also provides improved error diagnostics and feedback for the user on how to correct the problem when the bins do not contain all of the specified data.

If the provided object for which to calculate the histogram is a data frame, then a histogram is calculated for each numeric variable in the data frame and the results written to a pdf file in the current working directory. The name of this file and its path are specified in the output.

Usage

color.hist(x=NULL, ...)

## S3 method for class 'data.frame': color.hist(x, \ldots)

## S3 method for class 'default': color.hist(x, col="lightsteelblue", border="black", col.bg="ghostwhite", col.grid="grey90", over.grid=FALSE, breaks="Sturges", bin.start=NULL, bin.width=NULL, show.values=FALSE, prop=FALSE, cumul=c("off", "on", "both"), col.reg="snow2", digits.d=5, xlab=NULL, ylab=NULL, main=NULL, ...)

Arguments

x
Variable for which to construct the histogram. Can be a data frame. If not specified, then the data frame mydata is assumed.
col
Color of the histogram's bars.
border
Color of the border of the bars.
col.bg
Color of the plot background.
col.grid
Color of the grid lines.
over.grid
If true, plot the grid lines over the histogram.
breaks
The method for calculating the bins, or an explicit specification of the bins, such as with the standard R seq function or other options provided by the hist
bin.start
Optional specified starting value of the bins.
bin.width
Optional specified bin width, which can be specified with or without a bin.start value.
show.values
If true, display the frequency of the bin at the top of the corresponding bar.
prop
Specify proportions or relative frequencies on the vertical axis. Default is FALSE.
cumul
Specify a cumulative histogram. The value of "on" displays the cumulative histogram, with default of "off". The value of "both" superimposes the regular histogram.
col.reg
The color of the superimposed, regular histogram when cumul="both".
xlab
Label for x-axis. Defaults to variable name.
ylab
Label for y-axis. Defaults to Frequency or Proportion.
digits.d
Number of significant digits for each of the displayed summary statistics.
main
Title of graph.
...
Other parameter values for graphics as defined processed by hist and plot, including xlim, ylim, lwd and cex.la

Details

Results are based on the standard hist function for calculating and plotting a histogram, with the additional provided color capabilities and other options including a relative frequency histogram. However, a histogram with densities is not supported.

A somewhat common error by beginning users of the base R hist function may encounter is to manually specify a sequence of bins with the seq function that does not fully span the range of specified data values. The result is a rather cryptic error message and program termination. Here, color.hist detects this problem before attempting to generate the histogram with hist, and then informs the user of the problem with a more detailed and explanatory error message. Moreover, the entire range of bins need not be specified to customize the bins. Instead, just a bin width need be specified, bin.width, and/or a value that begins the first bin, bin.start. If a starting value is specified without a bin width, the default Sturges method provides the bin width.

The freq option from the the standard R hist function has no effect as it is always set to FALSE in each internal call to hist. To plot densities, which correspond to setting freq to FALSE, use the color.density function in this package.

See Also

hist, plot, par.

Examples

Run this code
# generate 100 random normal data values with three decimal digits
y <- round(rnorm(100),3)


# --------------------
# different histograms
# --------------------

# histogram with all defaults
color.hist(y)
# compare to standard R function hist
hist(y)

# histogram with specified bin width
color.hist(y, bin.width=.25)

# histogram with specified bins and grid lines displayed over the histogram
color.hist(y, breaks=seq(-5,5,.25), xlab="My Variable", over.grid=TRUE)

# histogram with bins calculated with the Scott method and values displayed
color.hist(y, breaks="Scott", show.values=TRUE)

# histogram with the number of suggested bins, with proportions
color.hist(y, breaks=25, prop=TRUE)

# histogram with specified colors, overriding defaults
# col.bg and col.grid are defined in color.hist
# all other parameters are defined in hist, par and plot functions
color.hist(y, col="darkblue", border="lightsteelblue4", col.bg="ivory",
  col.grid="darkgray", density=25, angle=-45, cex.lab=.8, cex.axis=.8,
  col.lab="sienna3", main="My Title", col.main="gray40", xlim=c(-5,5), lwd=2,
  xlab="My Favorite Variable")


# ---------------------
# cumulative histograms
# ---------------------

# cumulative histogram with superimposed regular histogram, all defaults
color.hist(y, cumul="both")

# cumulative histogram plus regular histogram
# present with proportions on vertical axis, override other defaults
color.hist(y, cumul="both", breaks=seq(-4,4,.25), prop=TRUE, 
  col.reg="mistyrose")


# ---------------------------------
# histograms for multiple variables
# ---------------------------------

# read data into data frame called mydata
#rad("http://web.pdx.edu/~gerbing/data/employees2.csv")

# histograms for all numeric variables in data frame called mydata
#color.hist()

# histograms for all numeric variables in data frame called mydata
#  with specified options
#color.hist(col="palegreen1", col.bg="ivory", show.values=TRUE)

# Use the subset function to specify a variable list
#color.hist(subset(mydata, select=c(Age,HealthPlan)))

Run the code above in your browser using DataLab