# by

##### Apply a Function to a Data Frame Split by Factors

Function `by`

is an object-oriented wrapper for
`tapply`

applied to data frames.

##### Usage

`by(data, INDICES, FUN, …, simplify = TRUE)`

##### Arguments

- data
an R object, normally a data frame, possibly a matrix.

- INDICES
a factor or a list of factors, each of length

`nrow(data)`

.- FUN
a function to be applied to (usually data-frame) subsets of

`data`

.- …
further arguments to

`FUN`

.- simplify
logical: see

`tapply`

.

##### Details

A data frame is split by row into data frames
subsetted by the values of one or more factors, and function
`FUN`

is applied to each subset in turn.

For the default method, an object with dimensions (e.g., a matrix) is
coerced to a data frame and the data frame method applied. Other
objects are also coerced to a data frame, but `FUN`

is applied
separately to (subsets of) each column of the data frame.

##### Value

An object of class `"by"`

, giving the results for each subset.
This is always a list if `simplify`

is false, otherwise a list or
array (see `tapply`

).

##### See Also

`tapply`

, `simplify2array`

.
`ave`

also applies a function block-wise.

##### Examples

`library(base)`

```
# NOT RUN {
require(stats)
by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
by(warpbreaks[, 1], warpbreaks[, -1], summary)
by(warpbreaks, warpbreaks[,"tension"],
function(x) lm(breaks ~ wool, data = x))
## now suppose we want to extract the coefficients by group
tmp <- with(warpbreaks,
by(warpbreaks, tension,
function(x) lm(breaks ~ wool, data = x)))
sapply(tmp, coef)
# }
```

*Documentation reproduced from package base, version 3.6.0, License: Part of R 3.6.0*

### Community examples

**imran.cs.uob@gmail.com**at Dec 24, 2018 base v3.5.2

1. This example uses `mtcars` dataframe. 2. Following r snippet first displays the structure of the `mtcars` dataframe and the uses `by` to group the rows of the dataframe by `cyl`(cylinders), and compute sum of `mpg` (miles per galon) for each group. ``` str(mtcars) by(data=mtcars$mpg, INDICES=mtcars$cyl, FUN=sum, na.rm=TRUE) ``` The output will be as follows: ``` mtcars$cyl: 4 [1] 293.3 ------------------------------------------------------------ mtcars$cyl: 6 [1] 138.2 ------------------------------------------------------------ mtcars$cyl: 8 [1] 211.4 ``` In addition to bulit in functions like `sum`, `by` also works for user defined functions e.g. in the following snippet `range_diff` a user defined function first calculates the `range` and then the `diff` of the argument. ``` range_diff <- function(x){diff(range(x))} by(data=mtcars$mpg, INDICES=mtcars$cyl, FUN=range_diff) ``` Calling `range_diff` in `by` returns the following output. ``` mtcars$cyl: 4 [1] 12.5 ------------------------------------------------------------ mtcars$cyl: 6 [1] 3.6 ------------------------------------------------------------ mtcars$cyl: 8 [1] 8.8 ```