Calculate and Present Descriptive Values in Pritable
buildDescrTbl(
df,
tests,
prmnames,
prmunits,
addFactorLevelsToNames = TRUE,
excel_style = TRUE,
groupby,
addungrouped = FALSE,
dopvals = FALSE,
ignore_test_errors = FALSE,
p.adjust.method = "holm",
orderedAsUnordered = FALSE,
factorlevellimit = 14,
show.minmax = TRUE,
show.IQR = FALSE,
report_tests = FALSE,
report_testmessages = FALSE,
pvals_formatting = TRUE,
pvals_digits = 3,
pvals_signiflev = 0.05,
extraLevels = NULL,
missingName = "missing",
nonNAsName = "N",
removeZeroNAs = TRUE,
removeZeroExtraLevels = TRUE,
includeNAs = FALSE,
includeNonNAs = FALSE,
printOrgAlignment = FALSE,
useutf8 = "latex",
verbose = 0,
without_attrs = FALSE,
sd_digits = "by_mean",
descr_digits = 2,
significant_digits = TRUE,
percentages = "columnwise"
)formatted data.frame with descriptive values
data.frame containing the variables of which to calc the descriptive values
character vector or list of characters or list of functions or list of lists. In each case the i-th element gives the test to perform on the ith variable in the df (excluding stratification variables). The test can either be given as character (name of test function) or as function or as list where the first element is again either character or function and the following elemenst are *named* additional arguments to that test function. The individual function has to accept (at least) the arguments 'values' and 'grouping' which are vectors of equal length. For convenience, this package shipes with some example functions; have a look at those if you want to supply your own. These convenience functions include w.chisq.test w.cor.test, w.fisher.test, w.kruskal.test, w.wilcox.test. the whole list/vector is recycled if too short.
names of the variables in df (if needed to be overwritten)
units of the variables in df
logical. if TRUE expand 'sex' to 'sex [m/w]'. Defaults to TRUE.
logical. if TRUE remove subsequent duplicates from the parameter column (as common in Excel). Default: TRUE
column of df. do more columns - one for each group. If the df$column is an ordered factor, the order will be respected in the resulting table
logical. if TRUE add a column 'total' with the ungrouped summary statistics. Default: FALSE
boolean. if TRUE an additional column containing the p-values comparing the two strata in groubpy. Only implemented for a two-level stratum until now.
logical. If TRUE returns an empty test results (as list).
character. if not NULL include an
additional column with adjusted p values. see
p.adjust.methods for possible values and
explanations. Defaults to "holm"
logical. treat ordered factors as unordered factors?
integer. for factors with more than
factorlevellimit levels, not all levels are printed
logical. if TRUE show minimum and maximum for numeric variables. Defaults to TRUE.
logical. if TRUE show 25% and 75% quantiles for numeric variables. Defaults to FALSE.
boolean. if TRUE one additional column in the result table will contain the test, that was performed to calculate the p value. Ignored if dopvals=FALSE
boolean. if TRUE one additional column in the result table will contain any warnings that appeared while the test was performed. Ignored if dopvals=FALSE
boolean. If FALSE report numbers, else report formatted strings (via prettyPvalue)
integer. Number of digits for p value formatting. Ignored when pvals_formatting==FALSE. Defaults to 2
double. The significance level for bold p value formatting. Ignored when pvals_formatting==FALSE. Defaults to 0.05
named list of lists. Names have to be variable names. Elements have to have to be named list of this form: `some label` = list(idxvec = idxvec, display = logical). Here idxvec needs to be a logical vector of length nrow(df) that specifies the affected rows. If display is TRUE the number of affected rows will be shown under some label.
character. name of missing values (default: missing)
character. name of not missing values (default: N)
boolean. if TRUE, rows for missing values containing only 0s are removed from the result.
boolean. if TRUE, rows for ExtraLevels containing only 0s are removed from the result.
boolean. Include number of NAs in the output? Currently only one of either includeNonNAs or includeNAs can be set to TRUE
boolean. Include number of not missing values (Non-NAs) in the output? Currently only one of either includeNonNAs or includeNAs can be set to TRUE
boolean. If TRUE, than a row like "<l> <r> <r>" will be included in the result df
character. one of c("latex", "utf8", "replace"). if 'latex' (the default) use \pm in the output; if 'replace' use +- in the output, if 'utf8' use the unicode character
numeric. level of verbosity (0 : silent)
logical. If TRUE return the descriptive table with attrs. Otherwise add df, groupby, and a 'full' (closer to tidy) version of the table as attributes. Defaults to TRUE.
character. one of c("by_mean", "fixed"). If 'by_mean', the number of decimal places of the standard deviation is limited by the number of decimal places of the mean.
integer. Number of digits for formatting of descriptive values. Defaults to 2.
boolean. if TRUE, the number of significant digits of is given by descr_digits. Otherwise the number of decimal places is fixed.
character. one of c("columnwise", "rowwise"). If 'rowwise', percentages are computed by row. Defaults to "columnwise"
Andreas Leha
Do a Table containing descriptiva values
ttt <- data.frame(data="training set",
age=runif(100, 0, 100),
sex=sample(c("m","f"), 100, replace=TRUE, prob=c(0.3, 0.7)),
score=factor(sample(1:5, 100, replace=TRUE),
ordered=TRUE,
levels=1:5))
ttt2 <- data.frame(data="test set",
age=runif(100, 0, 100),
sex=sample(c("m","f"), 100, replace=TRUE, prob=c(0.5,0.5)),
score=factor(sample(1:5, 100, replace=TRUE),
ordered=TRUE,
levels=1:5))
units <- c("years", "", "")
buildDescrTbl(rbind(ttt, ttt2),
prmunits=units,
groupby="data",
includeNAs=TRUE)
Run the code above in your browser using DataLab