by
Apply a Function to a Data Frame Split by Factors
Function by
is an objectoriented wrapper for
tapply
applied to data frames.
Usage
by(data, INDICES, FUN, ..., simplify = TRUE)
Arguments
 data
 an R object, normally a data frame, possibly a matrix.
 INDICES
 a factor or a list of factors, each of length
nrow(data)
.  FUN
 a function to be applied to (usually dataframe) subsets of
data
.  ...
 further arguments to
FUN
.  simplify
 logical: see
tapply
.
Details
A data frame is split by row into data frames
subsetted by the values of one or more factors, and function
FUN
is applied to each subset in turn.
For the default method, an object with dimensions (e.g., a matrix) is
coerced to a data frame and the data frame method applied. Other
objects are also coerced to a data frame, but FUN
is applied
separately to (subsets of) each column of the data frame.
Value

An object of class
"by"
, giving the results for each subset.
This is always a list if simplify
is false, otherwise a list or
array (see tapply
).
See Also
tapply
, simplify2array
.
ave
also applies a function blockwise.
Examples
library(base)
require(stats)
by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
by(warpbreaks[, 1], warpbreaks[, 1], summary)
by(warpbreaks, warpbreaks[,"tension"],
function(x) lm(breaks ~ wool, data = x))
## now suppose we want to extract the coefficients by group
tmp < with(warpbreaks,
by(warpbreaks, tension,
function(x) lm(breaks ~ wool, data = x)))
sapply(tmp, coef)
Community examples
1. This example uses `mtcars` dataframe. 2. Following r snippet first displays the structure of the `mtcars` dataframe and the uses `by` to group the rows of the dataframe by `cyl`(cylinders), and compute sum of `mpg` (miles per galon) for each group. ``` str(mtcars) by(data=mtcars$mpg, INDICES=mtcars$cyl, FUN=sum, na.rm=TRUE) ``` The output will be as follows: ``` mtcars$cyl: 4 [1] 293.3  mtcars$cyl: 6 [1] 138.2  mtcars$cyl: 8 [1] 211.4 ``` In addition to bulit in functions like `sum`, `by` also works for user defined functions e.g. in the following snippet `range_diff` a user defined function first calculates the `range` and then the `diff` of the argument. ``` range_diff < function(x){diff(range(x))} by(data=mtcars$mpg, INDICES=mtcars$cyl, FUN=range_diff) ``` Calling `range_diff` in `by` returns the following output. ``` mtcars$cyl: 4 [1] 12.5  mtcars$cyl: 6 [1] 3.6  mtcars$cyl: 8 [1] 8.8 ```