Last chance! 50% off unlimited learning
Sale ends in
ltable
makes use of data.table
capabilities to tabulate frequencies or
arbitrary functions of given variables into a long format
data.table
/data.frame
. expr.by.cj
is the
equivalent for more advanced users.
ltable(data, by.vars = NULL, expr = list(obs = .N), subset = NULL, use.levels = TRUE, na.rm = FALSE, robust = TRUE)
expr.by.cj(data, by.vars = NULL, expr = list(obs = .N), subset = NULL, use.levels = FALSE, na.rm = FALSE, robust = FALSE, .SDcols = NULL, enclos = parent.frame(1L), ...)
data.table
/data.frame
c('sex','agegroup')
expr
- but the result of expr
is also
returned as NA
for levels not existing in the subset. See Examples.TRUE
, uses factor levels of given
variables if present; if you want e.g. counts for levels
that actually have zero observatios but are levels in a factor variable,
use thisTRUE
, drops rows in table that have
NA
as values in any of by.vars
columnsTRUE
, runs the outputted data's
by.vars
columns through robust_values
before outputtingDT[, , ...]
; see data.table
; if NULL
,
uses all appropriate columns. See Examples for usage.DT[, , ...]
; see data.table
expr.by.cj
: Somewhat more streamlined ltable
with
defaults for speed. Explicit determination of enclosing environment
of data.
expr
for each unique combination of given by.vars
.By default makes use of any and all levels
present for
each variable in by.vars
. This is useful,
because even if a subset of the data does not contain observations
for e.g. a specific age group, those age groups are
nevertheless presented in the resulting table; e.g. with the default
expr = list(obs = .N)
all age group levels
are represented by a row and can have obs = 0
.
The function differs from the
vanilla table
by giving a long format table of values
regardless of the number of by.vars
given.
Make use of e.g. cast_simple
if data needs to be
presented in a wide format (e.g. a two-way table).
The rows of the long-format table are effectively Cartesian products
of the levels of each variable in by.vars
,
e.g. with by.vars = c("sex", "area")
all levels of
area
are repeated for both levels of sex
in the table.
The expr
allows the user to apply any function(s) on all
levels defined by by.vars
. Here are some examples:
data.table
to
calculate counts in each group
If use.levels = FALSE
, no levels
information will
be used. This means that if e.g. the agegroup
variable is a factor and has 18 levels defined, but only 15 levels
are present in the data, no rows for the missing
levels will be shown in the table.
na.rm
simply drops any rows from the resulting table where
any of the by.vars
values was NA
.
table
, cast_simple
, melt
sr <- copy(sire)
sr$agegroup <- cut(sr$dg_age, breaks=c(0,45,60,75,85,Inf))
## counts by default
ltable(sr, "agegroup")
## any expression can be given
ltable(sr, "agegroup", list(mage = mean(dg_age)))
ltable(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))
## also returns levels where there are zero rows (expressions as NA)
ltable(sr, "agegroup", list(obs = .N,
minage = min(dg_age),
maxage = max(dg_age)),
subset = dg_age < 85)
#### expr.by.cj
expr.by.cj(sr, "agegroup")
## any arbitrary expression can be given
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age)))
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))
## only uses levels of by.vars present in data
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)),
subset = dg_age < 70)
## .SDcols trick
expr.by.cj(sr, "agegroup", lapply(.SD, mean),
subset = dg_age < 70, .SDcols = c("dg_age", "status"))
Run the code above in your browser using DataLab