hs
From the standard R function hist
plots a frequency histogram with default colors, including background color and grid lines plus an option for a relative frequency and/or cumulative histogram, as well as summary statistics and a table that provides the bins, midpoints, counts, proportions, cumulative counts and cumulative proportions. Bins can be selected several different ways besides the default, including specifying just the bin width and/or the bin start. Also provides improved error diagnostics and feedback for the user on how to correct the problem when the bins do not contain all of the specified data.
If the provided object for which to calculate the histogram is a data frame, then a histogram is calculated for each numeric variable in the data frame and the results written to pdf files in the current working directory. The name of these files and associated paths are specified in the output.
Histogram(x=NULL, data=mydata, n.cat=getOption("n.cat"), col.fill=getOption("col.fill.bar"),
col.stroke=getOption("col.stroke.bar"),
col.bg=getOption("col.bg"),
col.grid=getOption("col.grid"),
col.reg="snow2", over.grid=FALSE,
cex.axis=.85, col.axis="gray30", col.ticks="gray30",
breaks="Sturges", bin.start=NULL, bin.width=NULL,
prop=FALSE, cumul=c("off", "on", "both"),
digits.d=NULL, xlab=NULL, ylab=NULL, main=NULL,
quiet=getOption("quiet"),
pdf.file=NULL, pdf.width=5, pdf.height=5, ...)
hs(...)
data
, that is, no variable specified, then the data frame
mydata
is assumed.mydata
.set
or use rgb
function directly.cumul="both"
.TRUE
, plot the grid lines over the histogram.cex.axis.
bin.start
value.FALSE
."on"
displays the
cumulative histogram, with default of "off"
. The value of "both"
superimposes the regular histogram.TRUE
, no text output. Can change system default with set
function.hist
function to calculate and plot a histogram, plus the additional provided color capabilities, a relative frequency histogram and summary statistics. However, a histogram with densities is not supported. The freq
option from the standard R hist
function has no effect as it is always set to FALSE
in each internal call to hist
. To plot densities, which correspond to setting freq
to FALSE
, use the lessR
function Density
.DATA
The data may either be a vector from the global enviornment, the user's workspace, as illustrated in the examples below, or a variable in a data frame. The default input data frame is mydata
. Specify a different data frame name with the data
option. Regardless of its name, the variables in the data frame are referenced directly by their names, that is, no need to invoke the standard R
mechanisms of the mydata$name
notation, the with
function or the attach
function. If the name of vector in the global environment and of a variable in the input data frame are the same, the vector is analyzed.
To obtain a histogram of each numerical variable in the mydata
data frame, use Histogram()
. Or, for a data frame with a different name, insert the name between the parentheses.
COLORS
Individual colors in the plot can be manipulated with options such as col.bars
for the color of the histogram bars. A color theme for all the colors can be chosen for a specific plot with the colors
option with the lessR
function set
. The default color theme is blue
, but a gray scale is available with "gray"
, and other themes are available as explained in set
, such as "red"
and "green"
. Use the option ghost=TRUE
for a black background, no grid lines and partial transparency of plotted colors.
VARIABLE LABELS
If variable labels exist, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read
.
ONLY VARIABLES ARE REFERENCED
The referenced variable in a lessR
function can only be a variable name. This referenced variable must exist in either the referenced data frame, mydata
by default, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:
> Histogram(rnorm(50)) # does NOT work}
Instead, do the following: > Y <- rnorm(50) # create vector Y in user workspace > Histogram(Y) # directly reference Y
ERROR DETECTION
A somewhat relatively common error by beginning users of the base R hist
function may encounter is to manually specify a sequence of bins with the seq
function that does not fully span the range of specified data values. The result is a rather cryptic error message and program termination. Here, Histogram
detects this problem before attempting to generate the histogram with hist
, and then informs the user of the problem with a more detailed and explanatory error message. Moreover, the entire range of bins need not be specified to customize the bins. Instead, just a bin width need be specified, bin.width
, and/or a value that begins the first bin, bin.start
. If a starting value is specified without a bin width, the default Sturges method provides the bin width.
PDF OUTPUT
Because of the customized graphic windowing system that maintains a unique graphic window for the Help function, the standard graphic output functions such as pdf
do not work with the lessR
graphics functions. Instead, to obtain pdf output, use the pdf.file
option, perhaps with the optional pdf.width
and pdf.height
options. These files are written to the default working directory, which can be explicitly specified with the R setwd
function.
R
function hist
, invisibly returns an object of class
"histogram". For details see hist
.
[object Object],[object Object]
# -------------------- # different histograms # --------------------
# histogram with all defaults Histogram(y) # short form hs(y) # compare to standard R function hist hist(y) # save the histogram to a pdf file Histogram(y, pdf.file="MyHistogram.pdf")
# histogram with no grid, red bars, black background, and black border Histogram(y, col.grid="transparent", col.bg="black", col.fill="red", col.stroke="black") # or set this color scheme for all subsequent analyses set("red", col.grid="transparent", col.bg="black", col.stroke.bar="black") Histogram(y)
# histogram with orange color theme, transparent orange bars, no grid lines set(colors="orange", ghost=TRUE) Histogram(y) # back to default of "blue" color theme set(colors="blue")
# histogram with specified bin width # can also use bin.start Histogram(y, bin.width=.25)
# histogram with specified bins and grid lines displayed over the histogram Histogram(y, breaks=seq(-5,5,.25), xlab="My Variable", over.grid=TRUE)
# histogram with bins calculated with the Scott method and values displayed Histogram(y, breaks="Scott", labels=TRUE)
# histogram with the number of suggested bins, with proportions Histogram(y, breaks=15, prop=TRUE)
# histogram with specified colors, overriding defaults # col.bg and col.grid are defined in histogram # all other parameters are defined in hist, par and plot functions Histogram(y, col.fill="darkblue", col.stroke="lightsteelblue4", col.bg="ivory", col.grid="darkgray", density=25, angle=-45, cex.lab=.8, cex.axis=.8, col.lab="sienna3", main="My Title", col.main="gray40", xlim=c(-5,5), lwd=2, xlab="My Favorite Variable")
# --------------------- # cumulative histograms # ---------------------
# cumulative histogram with superimposed regular histogram, all defaults Histogram(y, cumul="both")
# cumulative histogram plus regular histogram # present with proportions on vertical axis, override other defaults Histogram(y, cumul="both", breaks=seq(-4,4,.25), prop=TRUE, col.reg="mistyrose")
# ------------------------------------------------- # histograms for data frames and multiple variables # -------------------------------------------------
# create data frame, mydata, to mimic reading data with Read function # mydata contains both numeric and non-numeric data mydata <- data.frame(rnorm(100), rnorm(100), rnorm(100), rep(c("A","B"),50)) names(mydata) <- c("X","Y","Z","C")
# although data not attached, access the variable directly by its name Histogram(X)
# histograms for all numeric variables in data frame called mydata # except for numeric variables with unique values < n.cat # mydata is the default name, so does not need to be specified with data Histogram()
# variable of interest is in a data frame which is not the default mydata # access the breaks variable in the R provided warpbreaks data set # although data not attached, access the variable directly by its name data(warpbreaks) Histogram(breaks, data=warpbreaks) Histogram()
# histograms for all numeric variables in data frame called mydata # with specified options Histogram(col.fill="palegreen1", col.bg="ivory", labels=TRUE)
# Use the subset function to specify a variable list
# histograms for all specified numeric variables
mysub <- subset(mydata, select=c(X,Y))
Histogram(data=mysub)