Learn R Programming

popEpi (version 0.3.1)

ltable: Tabulate Counts and Other Functions by Multiple Variables into a Long-Format Table

Description

ltable makes use of data.table capabilities to tabulate frequencies or arbitrary functions of given variables into a long format data.table/data.frame. expr.by.cj is the equivalent for more advanced users.

Usage

ltable(data, by.vars = NULL, expr = list(obs = .N), subset = NULL, use.levels = TRUE, na.rm = FALSE, robust = TRUE)
expr.by.cj(data, by.vars = NULL, expr = list(obs = .N), subset = NULL, use.levels = FALSE, na.rm = FALSE, robust = FALSE, .SDcols = NULL, enclos = parent.frame(1L), ...)

Arguments

data
a data.table/data.frame
by.vars
names of variables that are used for categorization, as a character vector, e.g. c('sex','agegroup')
expr
object or a list of objects where each object is a function of a variable (see: details)
subset
a logical condition; data is limited accordingly before evaluating expr - but the result of expr is also returned as NA for levels not existing in the subset. See Examples.
use.levels
logical; if TRUE, uses factor levels of given variables if present; if you want e.g. counts for levels that actually have zero observatios but are levels in a factor variable, use this
na.rm
logical; if TRUE, drops rows in table that have NA as values in any of by.vars columns
robust
logical; if TRUE, runs the outputted data's by.vars columns through robust_values before outputting
.SDcols
advanced; a character vector of column names passed to inside the data.table's brackets DT[, , ...]; see data.table; if NULL, uses all appropriate columns. See Examples for usage.
enclos
advanced; an environment; the enclosing environment of the data.
...
advanced; other arguments passed to inside the data.table's brackets DT[, , ...]; see data.table

Functions

  • expr.by.cj: Somewhat more streamlined ltable with defaults for speed. Explicit determination of enclosing environment of data.

Details

Returns expr for each unique combination of given by.vars.

By default makes use of any and all levels present for each variable in by.vars. This is useful, because even if a subset of the data does not contain observations for e.g. a specific age group, those age groups are nevertheless presented in the resulting table; e.g. with the default expr = list(obs = .N) all age group levels are represented by a row and can have obs = 0.

The function differs from the vanilla table by giving a long format table of values regardless of the number of by.vars given. Make use of e.g. cast_simple if data needs to be presented in a wide format (e.g. a two-way table).

The rows of the long-format table are effectively Cartesian products of the levels of each variable in by.vars, e.g. with by.vars = c("sex", "area") all levels of area are repeated for both levels of sex in the table.

The expr allows the user to apply any function(s) on all levels defined by by.vars. Here are some examples:

  • .N or list(.N) is a function used inside a data.table to calculate counts in each group
  • list(obs = .N), same as above but user assigned variable name
  • list(sum(obs), sum(pyrs), mean(dg_age)), multiple objects in a list
  • list(obs = sum(obs), pyrs = sum(pyrs)), same as above with user defined var names

If use.levels = FALSE, no levels information will be used. This means that if e.g. the agegroup variable is a factor and has 18 levels defined, but only 15 levels are present in the data, no rows for the missing levels will be shown in the table.

na.rm simply drops any rows from the resulting table where any of the by.vars values was NA.

See Also

table, cast_simple, melt

Examples

Run this code
sr <- copy(sire)
sr$agegroup <- cut(sr$dg_age, breaks=c(0,45,60,75,85,Inf))
## counts by default
ltable(sr, "agegroup")

## any expression can be given
ltable(sr, "agegroup", list(mage = mean(dg_age)))
ltable(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))

## also returns levels where there are zero rows (expressions as NA)
ltable(sr, "agegroup", list(obs = .N, 
                            minage = min(dg_age), 
                            maxage = max(dg_age)), 
       subset = dg_age < 85)
       
#### expr.by.cj
expr.by.cj(sr, "agegroup")

## any arbitrary expression can be given
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age)))
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))

## only uses levels of by.vars present in data
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)), 
           subset = dg_age < 70)
           
## .SDcols trick
expr.by.cj(sr, "agegroup", lapply(.SD, mean), 
           subset = dg_age < 70, .SDcols = c("dg_age", "status"))

Run the code above in your browser using DataLab