Learn R Programming

caroline (version 0.7.4)

groupBy: Group a datafame by a factor and perform aggreate functions.

Description

The R equvalent of a SQL 'group by' call.

Usage

groupBy(df, by, aggregation,  clmns=names(df), collapse=',', distinct=FALSE, sql=FALSE, full.names=FALSE, ...)

Arguments

df
a data frame.
by
the factor (or name of a factor in df) used to determine the grouping.
aggregation
the functions to perform on the output (default is to sum). Suggested functions are: 'sum','mean','var','sd','max','min','length','paste',NULL.
clmns
the colums to include in the output.
collapse
string delimiter for columns aggregated via 'paste' concatenation.
distinct
used in conjunction with paste and collapse to only return unique elements in a delimited concatenated string
sql
whether or not to use SQLite to perform the grouping (not yet implimented).
full.names
names of the aggregation functions should be appended to the output column names
...
additional parameters (such as na.rm) passed to the underlying aggregate functions.

Value

  • an summary/aggregate dataframe

See Also

aggregate, bestBy

Examples

Run this code
df <- data.frame(a=runif(12),b=c(runif(11),NA), z=rep(letters[13:18],2),w=rep(letters[20:23],3))

groupBy(df=df, by='w', clmns=c(rep(c('a','b'),3),'z','w'), aggregation=c('sum','mean','var','sd','min','max','paste','length'), full.names=TRUE, na.rm=TRUE)
# or using SQLite
groupBy(df=df, by='w', clmns=c(rep(c('a','b'),2),'z','w'), aggregation=c('sum','mean','min','max','paste','length'), full.names=TRUE, sql=TRUE)


## passing a custom function
meantop <- function(x,n=2, ...)
  mean(x[order(x, decreasing=TRUE)][1:n], ...)
  
groupBy(df, by='w', aggregation=rep(c('mean','max','meantop'),2), clmns=rep(c('a','b'),3), na.rm=TRUE)

Run the code above in your browser using DataLab