tapply
Apply a Function Over a Ragged Array
Apply a function to each cell of a ragged array, that is to each (nonempty) group of values given by a unique combination of the levels of certain factors.
Usage
tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
Arguments
 X
 an atomic object, typically a vector.
 INDEX
 list of one or more factors, each of same length as
X
. The elements are coerced to factors byas.factor
.  FUN
 the function to be applied, or
NULL
. In the case of functions like+
,%*%
, etc., the function name must be backquoted or quoted. IfFUN
isNULL
, tapply returns a vector which can be used to subscript the multiway arraytapply
normally produces.  ...
 optional arguments to
FUN
: the Note section.  simplify
 If
FALSE
,tapply
always returns an array of mode"list"
. IfTRUE
(the default), then ifFUN
always returns a scalar,tapply
returns an array with the mode of the scalar.
Value

If
FUN
is not NULL
, it is passed to
match.fun
, and hence it can be a function or a symbol or
character string naming a function.When FUN
is present, tapply
calls FUN
for each
cell that has any data in it. If FUN
returns a single atomic
value for each such cell (e.g., functions mean
or var
)
and when simplify
is TRUE
, tapply
returns a
multiway array containing the values, and NA
for the
empty cells. The array has the same number of dimensions as
INDEX
has components; the number of levels in a dimension is
the number of levels (nlevels()
) in the corresponding component
of INDEX
. Note that if the return value has a class (e.g., an
object of class "Date"
) the class is discarded.Note that contrary to S, simplify = TRUE
always returns an
array, possibly 1dimensional.If FUN
does not return a single atomic value, tapply
returns an array of mode list
whose components are the
values of the individual calls to FUN
, i.e., the result is a
list with a dim
attribute.When there is an array answer, its dimnames
are named by
the names of INDEX
and are based on the levels of the grouping
factors (possibly after coercion).For a list result, the elements corresponding to empty cells are
NULL
.
Note
Optional arguments to FUN
supplied by the ...
argument
are not divided into cells. It is therefore inappropriate for
FUN
to expect additional arguments with the same length as
X
.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
See Also
the convenience functions by
and
aggregate
(using tapply
);
apply
,
lapply
with its versions
sapply
and mapply
.
Examples
library(base)
require(stats)
groups < as.factor(rbinom(32, n = 5, prob = 0.4))
tapply(groups, groups, length) # is almost the same as
table(groups)
## contingency table from data.frame : array with named dimnames
tapply(warpbreaks$breaks, warpbreaks[,1], sum)
tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum)
n < 17; fac < factor(rep(1:3, length = n), levels = 1:5)
table(fac)
tapply(1:n, fac, sum)
tapply(1:n, fac, sum, simplify = FALSE)
tapply(1:n, fac, range)
tapply(1:n, fac, quantile)
## example of ... argument: find quarterly means
tapply(presidents, cycle(presidents), mean, na.rm = TRUE)
ind < list(c(1, 2, 2), c("A", "A", "B"))
table(ind)
tapply(1:3, ind) #> the split vector
tapply(1:3, ind, sum)
Community examples
This example is originally given in [An Introduction to R](https://cran.rproject.org/doc/manuals/rrelease/Rintro.html). ```{r} statef < c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act") incomes < c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, 59, 46, 58, 43) (incmeans < tapply(incomes, statef, mean)) ```