tableby: Summary Statistics of a Set of Independent Variables by a Categorical Variable

Description

Summarize one or more variables (x) by a categorical variable (y). Variables on the right side of the formula, i.e. independent variables, are summarized by the levels of a categorical variable on the left of the formula. Optionally, an appropriate test is performed to test the distribution of the independent variables across the levels of the categorical variable.

Usage

tableby(formula, data, na.action, subset = NULL, weights = NULL,
  control = NULL, ...)
# S3 method for tableby
print(x, ...)

Arguments

formula

an object of class formula; a symbolic description of the variables to be summarized by the group, or categorical variable, of interest. See "Details" for more information. To only view overall summary statistics, a one-sided formula can be used.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which tableby is called.

na.action

a function which indicates what should happen when the data contain NAs. The default is na.tableby if there is a by variable, and na.pass if there is not. This schema thus includes observations with NAs in x variables, but removes those with NA in the categorical group variable.

subset

an optional vector specifying a subset of observations (rows of data) to be used in the results. Works as vector of logicals or an index.

weights

a vector of weights.

control

control parameters to handle optional settings within tableby. Two aspects of tableby are controlled with these: test options of RHS variables across levels of the categorical grouping variable, and x variable summaries within the grouping variable. Arguments for tableby.control can be passed to tableby via the ... argument, but if a control object and ... arguments are both supplied, the latter are used. See tableby.control for more details.

...

additional arguments to be passed to internal tableby functions. See "Details" for information. Currently not implemented in print.tableby.

an object of class tableby.

Value

An object with class 'tableby', which is effectively a list with the variables from the right-side in x and the group variable in y (if any). Then, each item in x has these:

stats

Summary statistics of the RHS variable within each level of the LHS variable

test

Formal test of the distribution of the RHS variable across the levels of the LHS variable

label

The label attribute of a variable. It is set to the label attribute of a data column, if it exists, otherwise set to the variable name in data. Can be changed with labels.tableby function for the tableby object.

The object also contains the original function call and the tableby.control list that is used in tableby.

Details

The group variable (if any) is categorical, which could be an integer, character, factor, or ordered factor. tableby makes a simple summary of the counts within the k-levels of the independent variables on the right side of the formula. Note that unused levels are dropped.

The data argument allows data.frames with label attributes for the columns, and those labels will be used in the summary methods for the tableby class.

The independent variables are a mixture of types: categorical (discrete), numeric (continuous), and time to event (survival). These variables are split by the levels of the group variable (if any), then summarized within those levels, specific to the variable type. A statistical test is performed to compare the distribution of the independent variables across the levels of the grouping variable.

The tests differ by the independent variable type, but can be specified explicitly in the formula statement or in the control function. These tests are accepted:

anova: analysis of variance test; the default test for continuous variables. When LHS variable has two levels, equivalent to two-sample t-test.
kwt: Kruskal-Wallis Rank Test, optional test for continuous variables. When LHS variable has two levels, equivalent to Wilcoxon test.
chisq: chi-square goodness of fit test for equal counts of a categorical variable across categories; the default for categorical or factor variables
fe: Fisher's exact test for categorical variables
trend: trend test for equal distribution of an ordered variable across a categorical variable; the default for ordered factor variables
logrank: log-rank , the default for time-to-event variables

To perform a mixture of asymptotic and rank-based tests on two different continuous variables, an example formula is: formula = group ~ anova(age) + kwt(height). The test settings in tableby.control apply to all independent variables of a given type.

The summary statistics reported for each independent variable within the group variable can be set in tableby.control.

Examples

Run this code

# NOT RUN {
data(mockstudy)
tab1 <- tableby(arm ~ sex + age, data=mockstudy)
summary(tab1, text=TRUE)

mylabels <- list(sex = "SEX", age ="Age, yrs")
summary(tab1, labelTranslations = mylabels, text=TRUE)

tab3 <- tableby(arm ~ sex + age, data=mockstudy, test=FALSE, total=FALSE,
                numeric.stats=c("median","q1q3"), numeric.test="kwt")
summary(tab3, text=TRUE)

tab.test <- tableby(arm ~ kwt(age) + anova(bmi) + kwt(ast), data=mockstudy)
tests(tab.test)
# }

Run the code above in your browser using DataLab