uwIntroStats (version 0.0.7)

tableStat: Table of Stratified Descriptive Statistics

Description

Produces a table of stratified descriptive statistics for a single variable of class integer, numeric, Surv, Date, or factor. Descriptive statistics are those that can be estimated using the descrip function.

Usage

## S3 class 'tableStat'
tableStat(variable=NULL, ..., stat="count", printer=TRUE, na.rm=TRUE, 
        subset=NULL, probs= c(.25,.50,.75), replaceZeroes=FALSE, 
        restriction=Inf, above=NULL, below=NULL, labove=NULL, rbelow=NULL, lbetween=NULL, 
        rbetween=NULL, interval=NULL, linterval=NULL, rinterval=NULL, lrinterval=NULL, 
        version=FALSE)

Arguments

variable

a vector or Surv object suitable for use as an argument to descrip(). If a NULL value is supplied for variable, the valid statistics returned by the function is only the cross-tabulation of counts and percentages within strata.

an arbitrary number of stratification variables. The arguments can be vectors, matrices, or lists. Individual columns of a matrix or elements of a list may be of class numeric, factor, or character. Stratification variables must all be the same length as each other and (if it is supplied) variable.

stat

a vector of character strings indicating the descriptive statistic(s) to be tabulated within strata. Possibilities include any statistic returned by descrip() as specified by one or more of ``count'', ``missing'', ``mean'', ``geometric mean'', ``median'', ``sd'', ``variance'', ``minimum'', ``maximum'', ``quantiles'', ``probabilities'', ``mn(sd)'', ``range'', ``iqr'', ``all'', ``row%'', ``col%'', or ``tot%''. Only enough of the string needs to be specified to disambiguate the choice. Alternatively (and more usefully), a single special format character string can be specified as described in the Details below.

printer

a logical indicating whether or not the function should return the values necessary for a print with special characters as laid out in stat.

na.rm

an indicator that missing data is to be removed prior to computation of the descriptive statistics.

subset

vector indicating a subset to be used for all descriptive statistics. If subset is supplied, all variables must be of that same length.

probs

a vector of probabilities between 0 and 1 indicating quantile estimates to be included in the descriptive statistics. Default is to compute 25th, 50th (median) and 75th percentiles.

replaceZeroes

if not FALSE, this indicates a value to be used in place of zeroes when computing a geometric mean. If TRUE, a value equal to one-half the lowest nonzero value is used. If a numeric value is supplied, that value is used for all variables.

restriction

a value used for computing restricted means, standard deviations, and geometric means with censored time to event data. The default value of Inf will cause restrictions at the highest observation. Note that the same value is used for all variables of class Surv.

above

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values greater than each element of above.

below

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values less than each element of below.

labove

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values greater than or equal to each element of labove.

rbelow

a vector of values used to dichotomize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values less than or equal to each element of rbelow.

lbetween

a vector of values with -Inf and Inf appended is used as cutpoints to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between successive elements of lbetween, with the left hand endpoint included in each interval.

rbetween

a vector of values with -Inf and Inf appended is used as cutpoints to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between successive elements of rbetween, with the right hand endpoint included in each interval.

interval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with neither endpoint included in each interval.

linterval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with the left hand endpoint included in each interval.

rinterval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with the right hand endpoint included in each interval.

lrinterval

a two column matrix of values in which each row is used to define intervals of interest to categorize variables. The descriptive statistics will include an estimate for each variable of the proportion of measurements with values between two elements in a row, with both endpoints included in each interval.

version

If TRUE, the version of the function will be returned. No other computations will be performed.

Value

An object of class tableStat is returned, which consists of a list of arrays. Each array corresponds to a table of stratified statistics for one of the possible choices of stat. The print method provides the formatted output for the choice specified in stat.

Details

This function uses descrip() to compute the descriptive statistics. In addition to the basic choices specified above for stat, the user can supply a special format character string. Arbitrary text can be specified to label any of the descriptive statistics, which are indicated by bracketing with a ``@''. All text bracketed by a ``@'' must refer to a descriptive statistic, and all other text is printed verbatim. For instance, a display of the mean, standard deviation, minimum, maximum, and sample size might be specified by ``@mean@ (@sd@; @min@ - @max@; n=@count@)''. Similarly, a cross tabulation displaying counts, row percentages, column percentages, and percentages of the total might be specified by ``@count@ (r @row%@; c @col%@; t @tot%@)''. See examples for more detal. Any call to tableStat() will run tableStat.default(), with user specified values in place of the appropriate defaults.

Examples

Run this code
# NOT RUN {
# Load required libraries
library(survival)

# Reading in a dataset
mri <- read.table("http://www.emersonstatistics.com/datasets/mri.txt",header=TRUE)

# Creating a Surv object to reflect time to death
mri$ttodth <- Surv(mri$obstime,mri$death)

# Reformatting an integer MMDDYY representation of date to be a Date object
mri$mridate <- as.Date(paste(trunc(mri$mridate/10000),trunc((mri$mridate %% 10000)/100),
mri$mridate %% 100,sep="/"),"%m/%d/%y")

# Cross tabulation of counts with sex and race strata
with (mri, tableStat (NULL, race, male, stat= "@count@ (r @row%@; c @col%@; t @tot%@)"))

# Cross tabulation of counts with sex, race, and coronary disease strata
# (Note row and column percentages are defined within the first two strata, while overall
# percentage considers all strata)
with (mri, tableStat (NULL, race, male, chd,
stat= "@count@ (r @row%@; c @col%@; t @tot%@)"))

# Description of time to death with appropriate quantiles
with (mri, tableStat(ttodth,probs=c(0.05,0.1,0.15,0.2),
stat="mean @mean@ (q05: @q@; q10: @q@; q15: @q@; q20: @q@; max: @max@)"))

# Description of mridate with mean, range stratified by race and sex
with (mri, tableStat(mridate, race, male,
stat="mean @mean@ (range @min@ - @max@)"))

# Stratified descriptive statistics with proportions
with (mri, tableStat(age,stat=">75: @p@; >85: @p@; [-Inf,75): @p@; [75,85): @p@; 
      [85,Inf): @p@"), above=c(75,85),lbetween=c(75,85))

# Descriptive statistics on a subset comprised of males
with (mri, tableStat(dsst,age,stroke,subset=male==1,
stat="@mean@ (@sd@; n= @count@/@missing@)"))

# }

Run the code above in your browser using DataLab