Learn R Programming

toaster (version 0.5.5)

getTableSummary: Compute columnwise statistics on Aster table.

Description

For table compute column statistics in Aster and augment data frame structure obtained with sqlColumns with columns containing computed statistics.

Usage

getTableSummary(channel, tableName, include = NULL, except = NULL, modeValue = FALSE, percentiles = c(5, 10, 25, 50, 75, 90, 95, 100), where = NULL, mock = FALSE, parallel = FALSE)

Arguments

channel
object as returned by odbcConnect.
tableName
name of the table in Aster.
include
a vector of column names to include. Output never contains attributes other than in the list.
except
a vector of column names to exclude. Output never contains attributes from the list.
modeValue
logical indicates if mode values should be computed. Default is FALSE.
percentiles
list of percentiles (integers between 0 and 100) to collect (always collects 25th and 75th for IQR calculation). There is no penalty in specifying more percentiles as they get calculated in a single call for each column - no matter how many different values are requested. When FALSE then percentiles calculations are skipped and result wil have no percentile and IQR columns.
where
SQL WHERE clause limiting data from the table (use SQL as if in WHERE clause but omit keyword WHERE).
mock
logical: if TRUE returns pre-computed table statistics for tables pitching or batting, only.
parallel
logical: enable parallel calls to Aster database. This option requires parallel backend enabled and registered (see in examples). Parallel execution requires ODBC channel obtained without explicit password: either with odbcConnect(dsn) or odbcDriverConnect calls, but not with odbcConnect(dsn, user, password).

Value

data frame returned by sqlColumns with additional columns:

Details

Computes columns statistics for all or specified table columns and adds them to the data frame with basic ODBC table metadata obtained with sqlColumns. Computed statistics include counts of all, non-null, distinct values; statistical summaries of maximum, minimum, mean, standard deviation, median (50th percentile), mode (optional), interquartile range, and desired percentiles. Each computed statistic adds a column to ODBC metadata data frame.

See Also

sqlColumns

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

pitchingInfo = getTableSummary(channel=conn, 'pitching_enh')
# list all table columns
pitchingInfo$COLUMN_NAME

# compute statistics on subset of baseball data after 1999
battingInfo = getTableSummary(channel=conn, 'batting_enh', 
                              where='yearid between 2000 and 2013')
                              
# compute statistics for certain columns including each percentile from 1 to 99
pitchingInfo = getTableSummary(channel=conn, 'pitching_enh',
                              include=c('h', 'er', 'hr', 'bb', 'so'),
                              percentiles=seq(1,99))
# list data frame column names to see all computed statistics
names(pitchingInfo)
                             
# compute statitics on all numeric columns except certain columns
teamInfo = getTableSummary(channel=conn, 'teams_enh', 
                   include=getNumericColumns(sqlColumns(conn, 'teams_enh')),
                   except=c('lgid', 'teamid', 'playerid', 'yearid', 'decadeid'))
}

Run the code above in your browser using DataLab