Learn R Programming

toaster (version 0.5.5)

createHistogram: Create histogram type of plot.

Description

Create histogram plot from the pre-computed distribution of data. Parameter data is a data frame containing intervals (bins) and counts obtained using computeHistogram or computeBarchart).

Usage

createHistogram(data, x = "bin_start", y = "bin_count", fill = NULL, position = "dodge", facet = NULL, ncol = 1, facetScales = "free_y", baseSize = 12, baseFamily = "", xlim = NULL, breaks = NULL, text = FALSE, percent = FALSE, digits = 0, textVJust = -2, mainColour = "black", fillColour = "grey", scaleGradient = NULL, paletteValues = NULL, palette = "Set1", trend = FALSE, trendLinetype = "solid", trendLinesize = 1, trendLinecolour = "black", title = paste("Histgoram by", fill), subtitle = NULL, xlab = x, ylab = y, legendPosition = "right", coordFlip = FALSE, defaultTheme = theme_tufte(base_size = baseSize, base_family = baseFamily), themeExtra = NULL)

Arguments

data
data frame contains computed histogram
x
name of a column containing bin labels or interval values
y
name of a column containing bin values or counts (bin size)
fill
name of a column with values to colour bars
position
histogram position parameter to use for overlapping bars: stack, dodge (defult), fill, identity
facet
vector of 1 or 2 column names to split up data to plot the subsets as facets. If single name then subset plots are placed next to each other, wrapping with ncol number of columns (uses facet_wrap). When two names then subset plots vary on both horizontal and vertical directions (grid) based on the column values (uses facet_grid).
ncol
number of facet columns (applies when single facet column supplied only - see parameter facet).
facetScales
Are scales shared across all subset plots (facets): "fixed" - all are the same, "free_x" - vary across rows (x axis), "free_y" - vary across columns (Y axis, default), "free" - both rows and columns (see in facet_wrap parameter scales )
baseSize
theme base font size
baseFamily
theme base font family
xlim
a character vector specifying the data range for the x scale and the default order of their display in the x axis.
breaks
a character vector giving the breaks as they should appear on the x axis.
text
if TRUE then display values above bars (default: FALSE) (this feature is in development)
percent
format text as percent
digits
number of digits to use in text
textVJust
vertical justificaiton of text labels (relative to the top of bar).
mainColour
Perimeter color of histogram bars
fillColour
Fill color of histogram bars (applies only when fill is NULL)
scaleGradient
control ggplot2 scale fill gradient manually, e.g use scale_colour_gradient (if specified then parameter palette is ignored)
paletteValues
actual palette colours for use with scale_fill_manual (if specified then parameter palette is ignored)
palette
Brewer palette name - see display.brewer.all in RColorBrewer package for names
trend
logical indicates if trend line is shown.
trendLinetype
trend line type.
trendLinesize
size of trend line.
trendLinecolour
color of trend line.
title
plot title.
subtitle
plot subtitle.
xlab
a label for the x axis, defaults to a description of x.
ylab
a label for the y axis, defaults to a description of y.
legendPosition
the position of legends. ("left", "right", "bottom", "top", or two-element numeric vector). "none" is no legend.
coordFlip
logical flipped cartesian coordinates so that horizontal becomes vertical, and vertical horizontal (see coord_flip).
defaultTheme
plot theme settings with default value theme_tufte. More themes are available here: ggtheme (by ggplot2) and ggthemes.
themeExtra
any additional theme settings that override default theme.

Value

ggplot object

See Also

computeHistogram and computeBarchart to compute data for histogram

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# AL teams pitching stats by decade
bc = computeBarchart(channel=conn, tableName="pitching_enh", category="teamid", 
                     aggregates=c("AVG(era) era", "AVG(whip) whip", "AVG(ktobb) ktobb"),
                     where="yearid >= 1990 and lgid='AL'", by="decadeid", withMelt=TRUE)

createHistogram(bc, "teamid", "value", fill="teamid", 
                facet=c("variable", "decadeid"), 
                legendPosition="bottom",
                title = "AL Teams Pitching Stats by decades (1990-2012)",
                themeExtra = guides(fill=guide_legend(nrow=2)))

# AL Teams Average Win-Loss Difference by Decade 
franchwl = computeBarchart(conn, "teams_enh", "franchid",
                           aggregates=c("AVG(w) w", "AVG(l) l", "AVG(w-l) wl"),
                           by="decadeid",
                           where="yearid >=1960 and lgid = 'AL'")

createHistogram(franchwl, "decadeid", "wl", fill="franchid",
                facet="franchid", ncol=5, facetScales="fixed",
                legendPosition="none",
                trend=TRUE,
                title="Average W-L difference by decade per team (AL)",
                ylab="Average W-L")  
                
# Histogram of team ERA distribution: Rangers vs. Yankees in 2000s
h2000s = computeHistogram(channel=conn, tableName='pitching_enh', columnName='era',
                          binsize=0.2, startvalue=0, endvalue=10, by='teamid',
                          where="yearID between 2000 and 2012 and teamid in ('NYA','TEX')")
createHistogram(h2000s, fill='teamid', facet='teamid', 
                title='TEX vs. NYY 2000-2012', xlab='ERA', ylab='count',
                legendPosition='none')                
                
}

Run the code above in your browser using DataLab