Create an object summarizing all baseline variables (both continuous and categorical) optionally stratifying by one or more startifying variables and performing statistical tests. The object gives a table that is easy to use in medical research papers.

```
CreateTableOne(vars, strata, data, factorVars, includeNA = FALSE,
test = TRUE, testApprox = chisq.test, argsApprox = list(correct = TRUE),
testExact = fisher.test, argsExact = list(workspace = 2 * 10^5),
testNormal = oneway.test, argsNormal = list(var.equal = TRUE),
testNonNormal = kruskal.test, argsNonNormal = list(NULL), smd = TRUE)
```

vars

Variables to be summarized given as a character vector. Factors are handled as categorical variables, whereas numeric variables are handled as continuous variables. If empty, all variables in the data frame specified in the data argument are used.

strata

Stratifying (grouping) variable name(s) given as a character vector. If omitted, the overall results are returned.

data

A data frame in which these variables exist. All variables (both vars and strata) must be in this data frame.

factorVars

Numerically coded variables that should be handled as categorical variables given as a character vector. Do not include factors, unless you need to relevel them by removing empty levels. If omitted, only factors are considered categorical variables. The variables specified here must also be specified in the `vars`

argument.

includeNA

If TRUE, NA is handled as a regular factor level rather than missing. NA is shown as the last factor level in the table. Only effective for categorical variables.

test

If TRUE, as in the default and there are more than two groups, groupwise comparisons are performed.

testApprox

A function used to perform the large sample approximation based tests. The default is `chisq.test`

. This is not recommended when some of the cell have small counts like fewer than 5.

argsApprox

A named list of arguments passed to the function specified in testApprox. The default is `list(correct = TRUE)`

, which turns on the continuity correction for `chisq.test`

.

testExact

A function used to perform the exact tests. The default is `fisher.test`

. If the cells have large numbers, it will fail because of memory limitation. In this situation, the large sample approximation based should suffice.

argsExact

A named list of arguments passed to the function specified in testExact. The default is `list(workspace = 2*10^5)`

, which specifies the memory space allocated for `fisher.test`

.

testNormal

A function used to perform the normal assumption based tests. The default is `oneway.test`

. This is equivalent of the t-test when there are only two groups.

argsNormal

A named list of arguments passed to the function specified in `testNormal`

. The default is `list(var.equal = TRUE)`

, which makes it the ordinary ANOVA that assumes equal variance across groups.

testNonNormal

A function used to perform the nonparametric tests. The default is `kruskal.test`

(Kruskal-Wallis Rank Sum Test). This is equivalent of the wilcox.test (Man-Whitney U test) when there are only two groups.

argsNonNormal

A named list of arguments passed to the function specified in `testNonNormal`

. The default is `list(NULL)`

, which is just a placeholder.

smd

If TRUE, as in the default and there are more than two groups, standardized mean differences for all pairwise comparisons are calculated.

An object of class `TableOne`

, which is a list of three objects.

object of class `ContTable`

, containing continuous variables only

object of class `CatTable`

, containing categorical variables only

list of metadata regarding variables

The definitions of the standardized mean difference (SMD) are available in Flury *et al* 1986 for the univariate case and the multivariate case (essentially the square root of the Mahalanobis distance). Extension to binary variables is discussed in Austin 2009 and extension to multinomival variables is suggested in Yang *et al* 2012. This multinomial extesion treats a single multinomial variable as multiple non-redundant dichotomous variables and use the Mahalanobis distance. The off diagonal elements of the covariance matrix on page 3 have an error, and need negation. In weighted data, the same definitions can be used except that the mean and standard deviation estimates are weighted estimates (Li *et al* 2013 and Austin *et al* 2015). In tableone, all weighted estimates are calculated by weighted estimation functions in the `survey`

package.

Flury, BK. and Riedwyl, H. (1986). Standard distance in univariate and multivariate analysis. *The American Statistician*, **40**, 249-251.

Austin, PC. (2009). Using the Standardized Difference to Compare the Prevalence of a Binary Variable Between Two Groups in Observational Research. *Communications in Statistics - Simulation and Computation*, **38**, 1228-1234.

Yang, D. and Dalton, JE. (2012). A unified approach to measuring the effect size between two groups using SAS. SAS Global Forum 2012, Paper 335-2012.

Li, L. and Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. *International Journal of Biostatistics*, **9**, 215-234.

Austin, PC. and Stuart, EA. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. *Statistics in Medicine*, Online on August 3, 2015.

## Load library(tableone) ## Load Mayo Clinic Primary Biliary Cirrhosis Data library(survival) data(pbc) ## Check variables head(pbc) ## Make categorical variables factors varsToFactor <- c("status","trt","ascites","hepato","spiders","edema","stage") pbc[varsToFactor] <- lapply(pbc[varsToFactor], factor) ## Create a variable list dput(names(pbc)) vars <- c("time","status","age","sex","ascites","hepato", "spiders","edema","bili","chol","albumin", "copper","alk.phos","ast","trig","platelet", "protime","stage") ## Create Table 1 stratified by trt tableOne <- CreateTableOne(vars = vars, strata = c("trt"), data = pbc) ## Just typing the object name will invoke the print.TableOne method tableOne ## Specifying nonnormal variables will show the variables appropriately, ## and show nonparametric test p-values. Specify variables in the exact ## argument to obtain the exact test p-values. cramVars can be used to ## show both levels for a 2-level categorical variables. print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"), exact = c("status","stage"), cramVars = "hepato", smd = TRUE) ## Use the summary.TableOne method for detailed summary summary(tableOne) ## See the categorical part only using $ operator tableOne$CatTable summary(tableOne$CatTable) ## See the continuous part only using $ operator tableOne$ContTable summary(tableOne$ContTable) ## If your work flow includes copying to Excel and Word when writing manuscripts, ## you may benefit from the quote argument. This will quote everything so that ## Excel does not mess up the cells. print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"), exact = c("status","stage"), quote = TRUE) ## If you want to center-align values in Word, use noSpaces option. print(tableOne, nonnormal = c("bili","chol","copper","alk.phos","trig"), exact = c("status","stage"), quote = TRUE, noSpaces = TRUE) ## If SMDs are needed as numericals, use ExtractSmd() ExtractSmd(tableOne)