by

0th

Percentile

Apply a Function to a Data Frame Split by Factors

Function by is an object-oriented wrapper for tapply applied to data frames.

Keywords
iteration, category
Usage
by(data, INDICES, FUN, …, simplify = TRUE)
Arguments
data

an R object, normally a data frame, possibly a matrix.

INDICES

a factor or a list of factors, each of length nrow(data).

FUN

a function to be applied to (usually data-frame) subsets of data.

further arguments to FUN.

simplify

logical: see tapply.

Details

A data frame is split by row into data frames subsetted by the values of one or more factors, and function FUN is applied to each subset in turn.

For the default method, an object with dimensions (e.g., a matrix) is coerced to a data frame and the data frame method applied. Other objects are also coerced to a data frame, but FUN is applied separately to (subsets of) each column of the data frame.

Value

An object of class "by", giving the results for each subset. This is always a list if simplify is false, otherwise a list or array (see tapply).

See Also

tapply, simplify2array. ave also applies a function block-wise.

Aliases
  • by
  • by.default
  • by.data.frame
  • print.by
Examples
library(base) # NOT RUN { require(stats) by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary) by(warpbreaks[, 1], warpbreaks[, -1], summary) by(warpbreaks, warpbreaks[,"tension"], function(x) lm(breaks ~ wool, data = x)) ## now suppose we want to extract the coefficients by group tmp <- with(warpbreaks, by(warpbreaks, tension, function(x) lm(breaks ~ wool, data = x))) sapply(tmp, coef) # }
Documentation reproduced from package base, version 3.4.3, License: Part of R 3.4.3

Community examples

imran.cs.uob@gmail.com at Dec 24, 2018 base v3.5.2

1. This example uses `mtcars` dataframe. 2. Following r snippet first displays the structure of the `mtcars` dataframe and the uses `by` to group the rows of the dataframe by `cyl`(cylinders), and compute sum of `mpg` (miles per galon) for each group. ``` str(mtcars) by(data=mtcars$mpg, INDICES=mtcars$cyl, FUN=sum, na.rm=TRUE) ``` The output will be as follows: ``` mtcars$cyl: 4 [1] 293.3 ------------------------------------------------------------ mtcars$cyl: 6 [1] 138.2 ------------------------------------------------------------ mtcars$cyl: 8 [1] 211.4 ``` In addition to bulit in functions like `sum`, `by` also works for user defined functions e.g. in the following snippet `range_diff` a user defined function first calculates the `range` and then the `diff` of the argument. ``` range_diff <- function(x){diff(range(x))} by(data=mtcars$mpg, INDICES=mtcars$cyl, FUN=range_diff) ``` Calling `range_diff` in `by` returns the following output. ``` mtcars$cyl: 4 [1] 12.5 ------------------------------------------------------------ mtcars$cyl: 6 [1] 3.6 ------------------------------------------------------------ mtcars$cyl: 8 [1] 8.8 ```