ltable: Tabulate Counts and Other Functions by Multiple Variables into a Long-Format Table

Description

ltable makes use of data.table capabilities to tabulate frequencies or arbitrary functions of given variables into a long format data.table/data.frame. expr.by.cj is the equivalent for more advanced users.

Usage

ltable(data, by.vars = NULL, expr = list(obs = .N), subset = NULL, use.levels = TRUE, na.rm = FALSE, robust = TRUE)
expr.by.cj(data, by.vars = NULL, expr = list(obs = .N), subset = NULL, use.levels = FALSE, na.rm = FALSE, robust = FALSE, .SDcols = NULL, enclos = parent.frame(1L), ...)

Arguments

data

a data.table/data.frame

by.vars

names of variables that are used for categorization, as a character vector, e.g. c('sex','agegroup')

expr

object or a list of objects where each object is a function of a variable (see: details)

subset

a logical condition; data is limited accordingly before evaluating expr - but the result of expr is also returned as NA for levels not existing in the subset. See Examples.

use.levels

logical; if TRUE, uses factor levels of given variables if present; if you want e.g. counts for levels that actually have zero observatios but are levels in a factor variable, use this

na.rm

logical; if TRUE, drops rows in table that have NA as values in any of by.vars columns

robust

logical; if TRUE, runs the outputted data's by.vars columns through robust_values before outputting

.SDcols

advanced; a character vector of column names passed to inside the data.table's brackets DT[, , ...]; see data.table; if NULL, uses all appropriate columns. See Examples for usage.

enclos

advanced; an environment; the enclosing environment of the data.

...

advanced; other arguments passed to inside the data.table's brackets DT[, , ...]; see data.table

Functions

expr.by.cj: Somewhat more streamlined ltable with defaults for speed. Explicit determination of enclosing environment of data.

Details

Returns expr for each unique combination of given by.vars.

By default makes use of any and all levels present for each variable in by.vars. This is useful, because even if a subset of the data does not contain observations for e.g. a specific age group, those age groups are nevertheless presented in the resulting table; e.g. with the default expr = list(obs = .N) all age group levels are represented by a row and can have obs = 0.

The function differs from the vanilla table by giving a long format table of values regardless of the number of by.vars given. Make use of e.g. cast_simple if data needs to be presented in a wide format (e.g. a two-way table).

The rows of the long-format table are effectively Cartesian products of the levels of each variable in by.vars, e.g. with by.vars = c("sex", "area") all levels of area are repeated for both levels of sex in the table.

The expr allows the user to apply any function(s) on all levels defined by by.vars. Here are some examples:

.N or list(.N) is a function used inside a data.table to calculate counts in each group
list(obs = .N), same as above but user assigned variable name
list(sum(obs), sum(pyrs), mean(dg_age)), multiple objects in a list
list(obs = sum(obs), pyrs = sum(pyrs)), same as above with user defined var names

If use.levels = FALSE, no levels information will be used. This means that if e.g. the agegroup variable is a factor and has 18 levels defined, but only 15 levels are present in the data, no rows for the missing levels will be shown in the table.

na.rm simply drops any rows from the resulting table where any of the by.vars values was NA.

Examples

Run this code

sr <- copy(sire)
sr$agegroup <- cut(sr$dg_age, breaks=c(0,45,60,75,85,Inf))
## counts by default
ltable(sr, "agegroup")

## any expression can be given
ltable(sr, "agegroup", list(mage = mean(dg_age)))
ltable(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))

## also returns levels where there are zero rows (expressions as NA)
ltable(sr, "agegroup", list(obs = .N, 
                            minage = min(dg_age), 
                            maxage = max(dg_age)), 
       subset = dg_age < 85)
       
#### expr.by.cj
expr.by.cj(sr, "agegroup")

## any arbitrary expression can be given
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age)))
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)))

## only uses levels of by.vars present in data
expr.by.cj(sr, "agegroup", list(mage = mean(dg_age), vage = var(dg_age)), 
           subset = dg_age < 70)
           
## .SDcols trick
expr.by.cj(sr, "agegroup", lapply(.SD, mean), 
           subset = dg_age < 70, .SDcols = c("dg_age", "status"))

Run the code above in your browser using DataLab