buildDescrTbl: buildDescrTbl

Description

Calculate and Present Descriptive Values in Pritable

Usage

buildDescrTbl(
  df,
  tests,
  prmnames,
  prmunits,
  addFactorLevelsToNames = TRUE,
  excel_style = TRUE,
  groupby,
  addungrouped = FALSE,
  dopvals = FALSE,
  ignore_test_errors = FALSE,
  p.adjust.method = "holm",
  orderedAsUnordered = FALSE,
  factorlevellimit = 14,
  show.minmax = TRUE,
  show.IQR = FALSE,
  report_tests = FALSE,
  report_testmessages = FALSE,
  pvals_formatting = TRUE,
  pvals_digits = 3,
  pvals_signiflev = 0.05,
  extraLevels = NULL,
  missingName = "missing",
  nonNAsName = "N",
  removeZeroNAs = TRUE,
  removeZeroExtraLevels = TRUE,
  includeNAs = FALSE,
  includeNonNAs = FALSE,
  printOrgAlignment = FALSE,
  useutf8 = "latex",
  verbose = 0,
  without_attrs = FALSE,
  sd_digits = "by_mean",
  descr_digits = 2,
  significant_digits = TRUE,
  percentages = "columnwise"
)

Value

formatted data.frame with descriptive values

Arguments

df: data.frame containing the variables of which to calc the descriptive values
tests: character vector or list of characters or list of functions or list of lists. In each case the i-th element gives the test to perform on the ith variable in the df (excluding stratification variables). The test can either be given as character (name of test function) or as function or as list where the first element is again either character or function and the following elemenst are *named* additional arguments to that test function. The individual function has to accept (at least) the arguments 'values' and 'grouping' which are vectors of equal length. For convenience, this package shipes with some example functions; have a look at those if you want to supply your own. These convenience functions include w.chisq.test w.cor.test, w.fisher.test, w.kruskal.test, w.wilcox.test. the whole list/vector is recycled if too short.
prmnames: names of the variables in df (if needed to be overwritten)
prmunits: units of the variables in df
addFactorLevelsToNames: logical. if TRUE expand 'sex' to 'sex [m/w]'. Defaults to TRUE.
excel_style: logical. if TRUE remove subsequent duplicates from the parameter column (as common in Excel). Default: TRUE
groupby: column of df. do more columns - one for each group. If the df$column is an ordered factor, the order will be respected in the resulting table
addungrouped: logical. if TRUE add a column 'total' with the ungrouped summary statistics. Default: FALSE
dopvals: boolean. if TRUE an additional column containing the p-values comparing the two strata in groubpy. Only implemented for a two-level stratum until now.
ignore_test_errors: logical. If TRUE returns an empty test results (as list).
p.adjust.method: character. if not NULL include an additional column with adjusted p values. see p.adjust.methods for possible values and explanations. Defaults to "holm"
orderedAsUnordered: logical. treat ordered factors as unordered factors?
factorlevellimit: integer. for factors with more than factorlevellimit levels, not all levels are printed
show.minmax: logical. if TRUE show minimum and maximum for numeric variables. Defaults to TRUE.
show.IQR: logical. if TRUE show 25% and 75% quantiles for numeric variables. Defaults to FALSE.
report_tests: boolean. if TRUE one additional column in the result table will contain the test, that was performed to calculate the p value. Ignored if dopvals=FALSE
report_testmessages: boolean. if TRUE one additional column in the result table will contain any warnings that appeared while the test was performed. Ignored if dopvals=FALSE
pvals_formatting: boolean. If FALSE report numbers, else report formatted strings (via prettyPvalue)
pvals_digits: integer. Number of digits for p value formatting. Ignored when pvals_formatting==FALSE. Defaults to 2
pvals_signiflev: double. The significance level for bold p value formatting. Ignored when pvals_formatting==FALSE. Defaults to 0.05
extraLevels: named list of lists. Names have to be variable names. Elements have to have to be named list of this form: `some label` = list(idxvec = idxvec, display = logical). Here idxvec needs to be a logical vector of length nrow(df) that specifies the affected rows. If display is TRUE the number of affected rows will be shown under some label.
missingName: character. name of missing values (default: missing)
nonNAsName: character. name of not missing values (default: N)
removeZeroNAs: boolean. if TRUE, rows for missing values containing only 0s are removed from the result.
removeZeroExtraLevels: boolean. if TRUE, rows for ExtraLevels containing only 0s are removed from the result.
includeNAs: boolean. Include number of NAs in the output? Currently only one of either includeNonNAs or includeNAs can be set to TRUE
includeNonNAs: boolean. Include number of not missing values (Non-NAs) in the output? Currently only one of either includeNonNAs or includeNAs can be set to TRUE
printOrgAlignment: boolean. If TRUE, than a row like "<l> <r> <r>" will be included in the result df
useutf8: character. one of c("latex", "utf8", "replace"). if 'latex' (the default) use \pm in the output; if 'replace' use +- in the output, if 'utf8' use the unicode character
verbose: numeric. level of verbosity (0 : silent)
without_attrs: logical. If TRUE return the descriptive table with attrs. Otherwise add df, groupby, and a 'full' (closer to tidy) version of the table as attributes. Defaults to TRUE.
sd_digits: character. one of c("by_mean", "fixed"). If 'by_mean', the number of decimal places of the standard deviation is limited by the number of decimal places of the mean.
descr_digits: integer. Number of digits for formatting of descriptive values. Defaults to 2.
significant_digits: boolean. if TRUE, the number of significant digits of is given by descr_digits. Otherwise the number of decimal places is fixed.
percentages: character. one of c("columnwise", "rowwise"). If 'rowwise', percentages are computed by row. Defaults to "columnwise"

Author

Andreas Leha

Details

Do a Table containing descriptiva values

Examples

Run this code

ttt <- data.frame(data="training set",
                  age=runif(100, 0, 100),
                  sex=sample(c("m","f"), 100, replace=TRUE, prob=c(0.3, 0.7)),
                  score=factor(sample(1:5, 100, replace=TRUE),
                    ordered=TRUE,
                    levels=1:5))
ttt2 <- data.frame(data="test set",
                   age=runif(100, 0, 100),
                   sex=sample(c("m","f"), 100, replace=TRUE, prob=c(0.5,0.5)),
                   score=factor(sample(1:5, 100, replace=TRUE),
                     ordered=TRUE,
                     levels=1:5))

units <- c("years", "", "")
buildDescrTbl(rbind(ttt, ttt2),
              prmunits=units,
              groupby="data",
              includeNAs=TRUE)

Run the code above in your browser using DataLab