Learn R Programming

descsuppR (version 1.2)

buildDescrTbl: buildDescrTbl

Description

Calculate and Present Descriptive Values in Pritable

Usage

buildDescrTbl(
  df,
  tests,
  prmnames,
  prmunits,
  addFactorLevelsToNames = TRUE,
  excel_style = TRUE,
  groupby,
  addungrouped = FALSE,
  dopvals = FALSE,
  ignore_test_errors = FALSE,
  p.adjust.method = "holm",
  orderedAsUnordered = FALSE,
  factorlevellimit = 14,
  show.minmax = TRUE,
  show.IQR = FALSE,
  report_tests = FALSE,
  report_testmessages = FALSE,
  pvals_formatting = TRUE,
  pvals_digits = 3,
  pvals_signiflev = 0.05,
  extraLevels = NULL,
  missingName = "missing",
  nonNAsName = "N",
  removeZeroNAs = TRUE,
  removeZeroExtraLevels = TRUE,
  includeNAs = FALSE,
  includeNonNAs = FALSE,
  printOrgAlignment = FALSE,
  useutf8 = "latex",
  verbose = 0,
  without_attrs = FALSE,
  sd_digits = "by_mean",
  descr_digits = 2,
  significant_digits = TRUE,
  percentages = "columnwise"
)

Value

formatted data.frame with descriptive values

Arguments

df

data.frame containing the variables of which to calc the descriptive values

tests

character vector or list of characters or list of functions or list of lists. In each case the i-th element gives the test to perform on the ith variable in the df (excluding stratification variables). The test can either be given as character (name of test function) or as function or as list where the first element is again either character or function and the following elemenst are *named* additional arguments to that test function. The individual function has to accept (at least) the arguments 'values' and 'grouping' which are vectors of equal length. For convenience, this package shipes with some example functions; have a look at those if you want to supply your own. These convenience functions include w.chisq.test w.cor.test, w.fisher.test, w.kruskal.test, w.wilcox.test. the whole list/vector is recycled if too short.

prmnames

names of the variables in df (if needed to be overwritten)

prmunits

units of the variables in df

addFactorLevelsToNames

logical. if TRUE expand 'sex' to 'sex [m/w]'. Defaults to TRUE.

excel_style

logical. if TRUE remove subsequent duplicates from the parameter column (as common in Excel). Default: TRUE

groupby

column of df. do more columns - one for each group. If the df$column is an ordered factor, the order will be respected in the resulting table

addungrouped

logical. if TRUE add a column 'total' with the ungrouped summary statistics. Default: FALSE

dopvals

boolean. if TRUE an additional column containing the p-values comparing the two strata in groubpy. Only implemented for a two-level stratum until now.

ignore_test_errors

logical. If TRUE returns an empty test results (as list).

p.adjust.method

character. if not NULL include an additional column with adjusted p values. see p.adjust.methods for possible values and explanations. Defaults to "holm"

orderedAsUnordered

logical. treat ordered factors as unordered factors?

factorlevellimit

integer. for factors with more than factorlevellimit levels, not all levels are printed

show.minmax

logical. if TRUE show minimum and maximum for numeric variables. Defaults to TRUE.

show.IQR

logical. if TRUE show 25% and 75% quantiles for numeric variables. Defaults to FALSE.

report_tests

boolean. if TRUE one additional column in the result table will contain the test, that was performed to calculate the p value. Ignored if dopvals=FALSE

report_testmessages

boolean. if TRUE one additional column in the result table will contain any warnings that appeared while the test was performed. Ignored if dopvals=FALSE

pvals_formatting

boolean. If FALSE report numbers, else report formatted strings (via prettyPvalue)

pvals_digits

integer. Number of digits for p value formatting. Ignored when pvals_formatting==FALSE. Defaults to 2

pvals_signiflev

double. The significance level for bold p value formatting. Ignored when pvals_formatting==FALSE. Defaults to 0.05

extraLevels

named list of lists. Names have to be variable names. Elements have to have to be named list of this form: `some label` = list(idxvec = idxvec, display = logical). Here idxvec needs to be a logical vector of length nrow(df) that specifies the affected rows. If display is TRUE the number of affected rows will be shown under some label.

missingName

character. name of missing values (default: missing)

nonNAsName

character. name of not missing values (default: N)

removeZeroNAs

boolean. if TRUE, rows for missing values containing only 0s are removed from the result.

removeZeroExtraLevels

boolean. if TRUE, rows for ExtraLevels containing only 0s are removed from the result.

includeNAs

boolean. Include number of NAs in the output? Currently only one of either includeNonNAs or includeNAs can be set to TRUE

includeNonNAs

boolean. Include number of not missing values (Non-NAs) in the output? Currently only one of either includeNonNAs or includeNAs can be set to TRUE

printOrgAlignment

boolean. If TRUE, than a row like "<l> <r> <r>" will be included in the result df

useutf8

character. one of c("latex", "utf8", "replace"). if 'latex' (the default) use \pm in the output; if 'replace' use +- in the output, if 'utf8' use the unicode character

verbose

numeric. level of verbosity (0 : silent)

without_attrs

logical. If TRUE return the descriptive table with attrs. Otherwise add df, groupby, and a 'full' (closer to tidy) version of the table as attributes. Defaults to TRUE.

sd_digits

character. one of c("by_mean", "fixed"). If 'by_mean', the number of decimal places of the standard deviation is limited by the number of decimal places of the mean.

descr_digits

integer. Number of digits for formatting of descriptive values. Defaults to 2.

significant_digits

boolean. if TRUE, the number of significant digits of is given by descr_digits. Otherwise the number of decimal places is fixed.

percentages

character. one of c("columnwise", "rowwise"). If 'rowwise', percentages are computed by row. Defaults to "columnwise"

Author

Andreas Leha

Details

Do a Table containing descriptiva values

Examples

Run this code
ttt <- data.frame(data="training set",
                  age=runif(100, 0, 100),
                  sex=sample(c("m","f"), 100, replace=TRUE, prob=c(0.3, 0.7)),
                  score=factor(sample(1:5, 100, replace=TRUE),
                    ordered=TRUE,
                    levels=1:5))
ttt2 <- data.frame(data="test set",
                   age=runif(100, 0, 100),
                   sex=sample(c("m","f"), 100, replace=TRUE, prob=c(0.5,0.5)),
                   score=factor(sample(1:5, 100, replace=TRUE),
                     ordered=TRUE,
                     levels=1:5))

units <- c("years", "", "")
buildDescrTbl(rbind(ttt, ttt2),
              prmunits=units,
              groupby="data",
              includeNAs=TRUE)

Run the code above in your browser using DataLab