Learn R Programming

toaster (version 0.5.5)

computeHistogram: Compute histogram distribution of the column.

Description

Compute histogram of the table column in Aster by mapping its value to bins based on parameters specified. When column is of numeric or temporal data type it uses map-reduce histogram function over continuous values. When column is categorical (character data types) it defers to computeBarchart that uses SQL aggregate COUNT(*) with GROUP BY . Result is a data frame to visualize as bar charts (see creating visualizations with createHistogram).

Usage

computeHistogram(channel, tableName, columnName, tableInfo = NULL, columnFrequency = FALSE, binMethod = "manual", binsize = NULL, startvalue = NULL, endvalue = NULL, numbins = NULL, useIQR = TRUE, datepart = NULL, where = NULL, by = NULL, test = FALSE, oldStyle = FALSE)

Arguments

channel
connection object as returned by odbcConnect
tableName
Aster table name
columnName
table column name to compute histogram
tableInfo
pre-built summary of data to use (require when test=TRUE). See getTableSummary.
columnFrequency
logical indicates to build histogram of frequencies of column
binMethod
one of several methods to determine number and size of bins: 'manual' indicates to use paramters below, both 'Sturges' or 'Scott' will use corresponding methods of computing number of bins and width (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width).
binsize
size (width) of discrete intervals defining histogram (all bins are equal)
startvalue
lower end (bound) of values to include in histogram
endvalue
upper end (bound) of values to include in histogram
numbins
number of bins to use in histogram
useIQR
logical indicates use of IQR interval to compute cutoff lower and upper bounds for values to be included in histogram: [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR], IQR = Q3 - Q1
datepart
field to extract from timestamp/date/time column to build histogram on
where
specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).
by
for optional grouping by one or more values for faceting or alike
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).
oldStyle
logical indicates if old style histogram paramters are in use (before Aster AF 5.11)

See Also

computeBarchart and createHistogram

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# Histogram of team ERA distribution: Rangers vs. Yankees in 2000s
h2000s = computeHistogram(channel=conn, tableName='pitching_enh', columnName='era',
                          binsize=0.2, startvalue=0, endvalue=10, by='teamid',
                          where="yearID between 2000 and 2012 and teamid in ('NYA','TEX')")
createHistogram(h2000s, fill='teamid', facet='teamid', 
                title='TEX vs. NYY 2000-2012', xlab='ERA', ylab='count',
                legendPosition='none') 
}

Run the code above in your browser using DataLab