# tapply

##### Apply a Function Over a Ragged Array

Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.

##### Usage

`tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)`

##### Arguments

- X
an R object for which a

`split`

method exists. Typically vector-like, allowing subsetting with`[`

.- INDEX
a

`list`

of one or more`factor`

s, each of same length as`X`

. The elements are coerced to factors by`as.factor`

.- FUN
the function to be applied, or

`NULL`

. In the case of functions like`+`

,`%*%`

, etc., the function name must be backquoted or quoted. If`FUN`

is`NULL`

, tapply returns a vector which can be used to subscript the multi-way array`tapply`

normally produces.- …
optional arguments to

`FUN`

: the Note section.- default
(only in the case of simplification to an array) the value with which the array is initialized as

`array(default, dim = ..)`

. Before R 3.4.0, this was hard coded to`array()`

's default`NA`

. If it is`NA`

(the default), the missing value of the answer type, e.g.`NA_real_`

, is chosen (`as.raw(0)`

for`"raw"`

). In a numerical case, it may be set, e.g., to`FUN(integer(0))`

, e.g., in the case of`FUN = sum`

to`0`

or`0L`

.- simplify
logical; if

`FALSE`

,`tapply`

always returns an array of mode`"list"`

; in other words, a`list`

with a`dim`

attribute. If`TRUE`

(the default), then if`FUN`

always returns a scalar,`tapply`

returns an array with the mode of the scalar.

##### Value

If `FUN`

is not `NULL`

, it is passed to
`match.fun`

, and hence it can be a function or a symbol or
character string naming a function.

When `FUN`

is present, `tapply`

calls `FUN`

for each
cell that has any data in it. If `FUN`

returns a single atomic
value for each such cell (e.g., functions `mean`

or `var`

)
and when `simplify`

is `TRUE`

, `tapply`

returns a
multi-way array containing the values, and `NA`

for the
empty cells. The array has the same number of dimensions as
`INDEX`

has components; the number of levels in a dimension is
the number of levels (`nlevels()`

) in the corresponding component
of `INDEX`

. Note that if the return value has a class (e.g., an
object of class `"Date"`

) the class is discarded.

Note that contrary to S, `simplify = TRUE`

always returns an
array, possibly 1-dimensional.

If `FUN`

does not return a single atomic value, `tapply`

returns an array of mode `list`

whose components are the
values of the individual calls to `FUN`

, i.e., the result is a
list with a `dim`

attribute.

When there is an array answer, its `dimnames`

are named by
the names of `INDEX`

and are based on the levels of the grouping
factors (possibly after coercion).

For a list result, the elements corresponding to empty cells are
`NULL`

.

##### Note

Optional arguments to `FUN`

supplied by the `...`

argument
are not divided into cells. It is therefore inappropriate for
`FUN`

to expect additional arguments with the same length as
`X`

.

##### References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

##### See Also

the convenience functions `by`

and
`aggregate`

(using `tapply`

);
`apply`

,
`lapply`

with its versions
`sapply`

and `mapply`

.

##### Examples

`library(base)`

```
# NOT RUN {
require(stats)
groups <- as.factor(rbinom(32, n = 5, prob = 0.4))
tapply(groups, groups, length) #- is almost the same as
table(groups)
## contingency table from data.frame : array with named dimnames
tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum)
n <- 17; fac <- factor(rep_len(1:3, n), levels = 1:5)
table(fac)
tapply(1:n, fac, sum)
tapply(1:n, fac, sum, default = 0) # maybe more desirable
tapply(1:n, fac, sum, simplify = FALSE)
tapply(1:n, fac, range)
tapply(1:n, fac, quantile)
tapply(1:n, fac, length) ## NA's
tapply(1:n, fac, length, default = 0) # == table(fac)
# }
# NOT RUN {
## example of ... argument: find quarterly means
tapply(presidents, cycle(presidents), mean, na.rm = TRUE)
ind <- list(c(1, 2, 2), c("A", "A", "B"))
table(ind)
tapply(1:3, ind) #-> the split vector
tapply(1:3, ind, sum)
## Some assertions (not held by all patch propsals):
nq <- names(quantile(1:5))
stopifnot(
identical(tapply(1:3, ind), c(1L, 2L, 4L)),
identical(tapply(1:3, ind, sum),
matrix(c(1L, 2L, NA, 3L), 2, dimnames = list(c("1", "2"), c("A", "B")))),
identical(tapply(1:n, fac, quantile)[-1],
array(list(`2` = structure(c(2, 5.75, 9.5, 13.25, 17), .Names = nq),
`3` = structure(c(3, 6, 9, 12, 15), .Names = nq),
`4` = NULL, `5` = NULL), dim=4, dimnames=list(as.character(2:5)))))
# }
```

*Documentation reproduced from package base, version 3.5.1, License: Part of R 3.5.1*

### Community examples

**mjchen.gene@gmail.com**at May 13, 2019 base v3.6.0

This example is originally given in [An Introduction to R](https://cran.r-project.org/doc/manuals/r-release/R-intro.html). ```{r} statef <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act") incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, 59, 46, 58, 43) (incmeans <- tapply(incomes, statef, mean)) ```