Additionally, this function run extra checks on the data:
an error message is triggered if any subject (identified by subjectVar)
have different values in a continuous var
an indicative message is triggered if multiple but identical records are available
for subjectVar and a continuous var
computeSummaryStatistics(
data,
var = NULL,
varTotalInclude = FALSE,
statsExtra = NULL,
subjectVar = "USUBJID",
filterEmptyVar = TRUE,
type = "auto",
checkVarDiffBySubj = c("error", "warning", "none"),
msgLabel = NULL,
msgVars = NULL
)Data.frame with summary statistics in columns,
depending if type is:
'summary':
'statN': number of subjects
'statm': number of records
'statMean': mean of var
'statSD': standard deviation of var
'statSE': standard error the mean of var
'statMedian': median of var
'statMin': minimum of var
'statMax': maximum of var
'count':
'variableGroup': factor with groups of var for which counts are reported
'statN': number of subjects
'statm': number of records
Data.frame with dataset to consider for the summary table.
Character vector with variable(s) of data,
to compute statistics on.
If NULL (by default), counts by row/column variable(s) are computed.
To also return counts of the rowVar in case other var
are specified, you can include: 'all' in the var.
Missing values, if present, are filtered
(also for the report of number of subjects/records).
Logical (FALSE by default)
Should the total across all categories of var
be included for the count table?
Only used if var is a categorical variable.
(optional) Named list with functions for additional custom
statistics to be computed.
Each function:
has as parameter, either: 'x': the variable (var) to compute
the summary statistic on or 'data': the entire dataset
returns the corresponding summary statistic as a numeric vector
For example, to additionally compute the coefficient of variation, this can be set to:
list(statCVPerc = function(x) sd(x)/mean(x)*100) (or cv).
String, variable of data with subject ID,
'USUBJID' by default.
Logical, if TRUE doesn't return any results if the variable is empty, otherwise return 0 for the counts and NA for summary statistics. Criterias to consider a variable empty are:
for a continuous variable: all missing (NA)
for a categorical variable: all missing or **category is included in the
factor levels but not available in data**
By default, an empty variable are filtered.
String with type of table:
'summaryTable': summary table with statistics for numeric variable
'countTable': count table
'auto' (by default): 'summaryTable' if the variable is numeric, 'countTable' otherwise
String, 'error' (default), 'warning',
or 'none'.
Should an error, a warning, or nothing be produced
if a continuous variable (var) contains
different values for the same subject?
(optional) String with label for the data (NULL by default), included in the message/warning for checks.
(optional) Character vector with columns of data
containing extra variables (besides var and subjectVar)
that should be included in the message/warning for checks.
Laure Cougnaud