Additionally, this function run extra checks on the data:
an error message is triggered if any subject (identified by subjectVar
)
have different values in a continuous var
an indicative message is triggered if multiple but identical records are available
for subjectVar
and a continuous var
computeSummaryStatistics(
data,
var = NULL,
varTotalInclude = FALSE,
statsExtra = NULL,
subjectVar = "USUBJID",
filterEmptyVar = TRUE,
type = "auto",
checkVarDiffBySubj = c("error", "warning", "none"),
msgLabel = NULL,
msgVars = NULL
)
Data.frame with summary statistics in columns,
depending if type
is:
'summary':
'statN': number of subjects
'statm': number of records
'statMean': mean of var
'statSD': standard deviation of var
'statSE': standard error the mean of var
'statMedian': median of var
'statMin': minimum of var
'statMax': maximum of var
'count':
'variableGroup': factor with groups of var
for which counts are reported
'statN': number of subjects
'statm': number of records
Data.frame with dataset to consider for the summary table.
Character vector with variable(s) of data
,
to compute statistics on.
If NULL (by default), counts by row/column variable(s) are computed.
To also return counts of the rowVar
in case other var
are specified, you can include: 'all' in the var
.
Missing values, if present, are filtered
(also for the report of number of subjects/records).
Logical (FALSE by default)
Should the total across all categories of var
be included for the count table?
Only used if var
is a categorical variable.
(optional) Named list with functions for additional custom
statistics to be computed.
Each function:
has as parameter, either: 'x': the variable (var
) to compute
the summary statistic on or 'data': the entire dataset
returns the corresponding summary statistic as a numeric vector
For example, to additionally compute the coefficient of variation, this can be set to:
list(statCVPerc = function(x) sd(x)/mean(x)*100)
(or cv
).
String, variable of data
with subject ID,
'USUBJID' by default.
Logical, if TRUE doesn't return any results if the variable is empty, otherwise return 0 for the counts and NA for summary statistics. Criterias to consider a variable empty are:
for a continuous variable: all missing (NA)
for a categorical variable: all missing or **category is included in the
factor levels but not available in data
**
By default, an empty variable are filtered.
String with type of table:
'summaryTable': summary table with statistics for numeric variable
'countTable': count table
'auto' (by default): 'summaryTable' if the variable is numeric, 'countTable' otherwise
String, 'error' (default), 'warning',
or 'none'.
Should an error, a warning, or nothing be produced
if a continuous variable (var
) contains
different values for the same subject?
(optional) String with label for the data (NULL by default), included in the message/warning for checks.
(optional) Character vector with columns of data
containing extra variables (besides var
and subjectVar
)
that should be included in the message/warning for checks.
Laure Cougnaud