Learn R Programming

toaster (version 0.5.5)

computePercentiles: Compute percentiles of column values.

Description

Compute percentiles including boxplot quartiles across values of column columnName. Multiple sets of percentiles achieved with the parameter by. Vector by may contain arbitrary number of column names: the percentiles are computed for each combination of values from these columns. Remember that when using computed quartiles with function createBoxplot it can utilize up to 3 columns by displaying them along the x-axis and inside facets.

Usage

computePercentiles(channel, tableName, columnName = NULL, columns = columnName, temporal = FALSE, percentiles = c(ifelse(temporal, 5, 0), 5, 10, 25, 50, 75, 90, 95, 100), by = NULL, where = NULL, nameInDataFrame = "column", stringsAsFactors = FALSE, test = FALSE, parallel = FALSE)

Arguments

channel
connection object as returned by odbcConnect
tableName
Aster table name
columnName
deprecated. Use vector columns instead.
columns
names of the columns to compute percentiles on
temporal
logical: TRUE indicates all columns are temporal, otherwsie numerical. Temporal percentiles have 2 values: character value representing temporal percentile (date, time, timestamp or datetime) and integer epoch value of the number of seconds since 1970-01-01 00:00:00-00 (can be negative) or for interval values includeing time, the total number of seconds in the interval.
percentiles
integer vector with percentiles to compute. Values 0, 25, 50, 75, 100 will always be added if omitted for numerical types, and 25, 50, 75, 100 for temporal. Percentile 0 (minimum) has to be included explicitly for temporals as its computation affects performance more than others.
by
for optional grouping by one or more values for faceting or alike. Used with createBoxplot in combination with column name for x-axis and wrap or grid faceting.
where
specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).
nameInDataFrame
name of the column in returned data frame to store table column name(s) defined by parameter columns. NULL indicates omit this column from the data frame (not recommended when computing percentiles for multiple columns).
stringsAsFactors
logical: should columns returned as character and not excluded by as.is and not converted to anything else be converted to factors?
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).
parallel
logical: enable parallel calls to Aster database. This option requires parallel backend enabled and registered (see in examples). Parallel execution requires ODBC channel obtained without explicit password: either with odbcConnect(dsn) or odbcDriverConnect calls, but not with odbcConnect(dsn, user, password).

Value

For numeric data function returns a data frame with percentile values organized into following columns:
  • percentile percentile to compute (from 0 to 100): will contain all valid values from percentiles
  • value computed percentile
  • column table column name. Override name column with parameter nameInDataFrame or omit this column all together if NULL.
  • by[1], by[2], ... in presence of parameter by, contain values of the grouping columns for computed percentiles (optional).
For temporal data function returns a data frame with percentile values organized into following columns:
  • percentile percentile to compute (from 0 to 100): will contain all valid values from percentiles
  • value computed percentile value converted from temporal data type to its character representation.
  • epoch corresponding to temporal percentile value epoch: for date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); for interval values include time, the total number of seconds in the interval.
  • column table column name. Override name column with parameter nameInDataFrame or omit this column all together if NULL.
  • by[1], by[2], ... in presence of parameter by, contain values of the grouping columns for computed percentiles (optional).

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# ipouts percentiles for pitching ipouts for AL in 2000s
ipop = computePercentiles(conn, "pitching", "ipouts",
                          where = "lgid = 'AL' and yearid >= 2000")

# ipouts percentiles by league
ipopLg = computePercentiles(conn, "pitching", "ipouts", by="lgid")

# percentiles on temporal columns
playerAllDates = computePercentiles(conn, "master_enh", 
                    columns=c('debut','finalgame','birthdate','deathdate'),
                    temporal=TRUE, percentiles=c(0))
createBoxplot(playerAllDates, x='column', value='epoch', useIQR=TRUE, 
              title="Boxplots for Date columns (epoch values)", 
              legendPosition="none")

}

Run the code above in your browser using DataLab