Hmisc (version 3.0-1)

bystats: Statistics by Categories

Description

For any number of cross-classification variables, bystats returns a matrix with the sample size, number missing y, and fun(non-missing y), with the cross-classifications designated by rows. Uses Harrell's modification of the interaction function to produce cross-classifications. The default fun is mean, and if y is binary, the mean is labeled as Fraction. There is a print method as well as a latex method for objects created by bystats. bystats2 handles the special case in which there are 2 classifcation variables, and places the first one in rows and the second in columns. The print method for bystats2 uses the S-Plus print.char.matrix function to organize statistics for cells into boxes.

Usage

bystats(y, ..., fun, nmiss, subset)
## S3 method for class 'bystats':
print(x, \dots)
## S3 method for class 'bystats':
latex(object, title, caption, rowlabel, \dots)
bystats2(y, v, h, fun, nmiss, subset)
## S3 method for class 'bystats2':
print(x, abbreviate.dimnames=FALSE,
   prefix.width=max(nchar(dimnames(x)[[1]])), ...)
## S3 method for class 'bystats2':
latex(object, title, caption, rowlabel, \dots)

Arguments

y
a binary, logical, or continuous variable or a matrix or data frame of such variables. If y is a data frame it is converted to a matrix. If y is a data frame or matrix, computations are done on subsets of the rows of y
...
For bystats, one or more classifcation variables separated by commas. For print.bystats, options passed to print.default such as digits. For latex.bystats, and latex.bystats2, o
v
vertical variable for bystats2. Will be converted to factor.
h
horizontal variable for bystats2. Will be converted to factor.
fun
a function to compute on the non-missing y for a given subset. You must specify fun= in front of the function name or definition. fun may return a single number or a vector or matrix of any length. Matrix results are
nmiss
A column containing a count of missing values is included if nmiss=TRUE or if there is at least one missing value.
subset
a vector of subscripts or logical values indicating the subset of data to analyze
abbreviate.dimnames
set to TRUE to abbreviate dimnames in output
prefix.width
see print.char.matrix if using S-Plus
title
title to pass to latex.default. Default is the first word of the character string version of the first calling argument.
caption
caption to pass to latex.default. Default is the heading attribute from the object produced by bystats.
rowlabel
rowlabel to pass to latex.default. Default is the byvarnames attribute from the object produced by bystats. For bystats2 the default is "".
x
an object created by bystats or bystats2
object
an object created by bystats or bystats2

Value

  • for bystats, a matrix with row names equal to the classification labels and column names N, Missing, funlab, where funlab is determined from fun. A row is added to the end with the summary statistics computed on all observations combined. The class of this matrix is bystats. For bystats, returns a 3-dimensional array with the last dimension corresponding to statistics being computed. The class of the array is bystats2.

Side Effects

latex produces a .tex file.

concept

grouping

See Also

interaction, cut, cut2, latex, print.char.matrix, translate

Examples

Run this code
bystats(sex==2, county, city)
bystats(death, race)
bystats(death, cut2(age,g=5), race)
bystats(cholesterol, cut2(age,g=4), sex, fun=median)
bystats(cholesterol, sex, fun=quantile)
bystats(cholesterol, sex, fun=function(x)c(Mean=mean(x),Median=median(x)))
latex(bystats(death,race,nmiss=FALSE,subset=sex=="female"), digits=2)
f <- function(y) c(Hazard=sum(y[,2])/sum(y[,1]))
# f() gets the hazard estimate for right-censored data from exponential dist.
bystats(cbind(d.time, death), race, sex, fun=f)
bystats(cbind(pressure, cholesterol), age.decile, 
        fun=function(y) c(Median.pressure   =median(y[,1]),
                          Median.cholesterol=median(y[,2])))
y <- cbind(pressure, cholesterol)
bystats(y, age.decile, 
        fun=function(y) apply(y, 2, median))   # same result as last one
bystats(y, age.decile, fun=function(y) apply(y, 2, quantile, c(.25,.75)))
# The last one computes separately the 0.25 and 0.75 quantiles of 2 vars.
latex(bystats2(death, race, sex, fun=table))

Run the code above in your browser using DataCamp Workspace