Learn R Programming

toaster (version 0.5.5)

computeAggregates: Compute aggregate values.

Description

Compute aggregates using SQL SELECT...GROUP BY in Aster. Aggregates may be any valid SQL expressions (including SQL WINDOW functions) in context of group columns (parameter by). Neither SQL ORDER BY nor LIMIT clauses are supported (use computeBarchart when they are required).

Usage

computeAggregates(channel, tableName, aggregates = c("COUNT(*) cnt"), by = vector(), where = NULL, ..., test = FALSE)

Arguments

channel
connection object as returned by odbcConnect
tableName
table name
aggregates
vector of SQL aggregates to compute. Aggregates may have optional aliases like in "AVG(era) avg_era"
by
character vecotr of column names and/or expressions on which grouping is performed (with SQL GROUP BY ...). Each can be a column or a valid SQL non-aggregate expression with otional alias separated by space (e.g. "UPPER(car_make) make").
where
specifies criteria to satisfy by the table rows before applying computation. The creteria are expressed in the form of SQL predicates (inside WHERE clause).
...
additional arguments to be passed to sqlQuery for more control over performance and data type conversion. By default, stringsAsFactors is set to FALSE.
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions like sqlQuery and sqlSave).

Details

No columns returned as character and not excluded by as.is are converted to factors by default, i.e. stringsAsFactors = FALSE when calling sqlQuery if not specified when calling this function.

Examples

Run this code
if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

# compute average team rank and attendance by decade
data = computeAggregates(channel = conn, tableName = "teams_enh",
               by = c("name || ', ' || park teamname", "lgid", "teamid", "decadeid"),
               aggregates = c("min(name) name", "min(park) park", "avg(rank) rank", 
                              "avg(attendance) attendance"))
               
# compute total strike-outs for each team in decades starting with 1980
# and also percent (share) of team strikeouts within a decade
data = computeAggregates(channel = conn, "pitching_enh",
               by = c("teamid", "decadeid"), 
               aggregates = c("sum(so) so", 
                              "sum(so)/(sum(sum(so)) over (partition by decadeid)) percent"),
               where = "decadeid >= 1980")
}
  

Run the code above in your browser using DataLab