computeSummaryStatistics: Compute summary statistics of interest of an unique variable of interest.

Description

Additionally, this function run extra checks on the data:

an error message is triggered if any subject (identified by subjectVar) have different values in a continuous var
an indicative message is triggered if multiple but identical records are available for subjectVar and a continuous var

Usage

computeSummaryStatistics(
  data,
  var = NULL,
  varTotalInclude = FALSE,
  statsExtra = NULL,
  subjectVar = "USUBJID",
  filterEmptyVar = TRUE,
  type = "auto",
  checkVarDiffBySubj = c("error", "warning", "none"),
  msgLabel = NULL,
  msgVars = NULL
)

Value

Data.frame with summary statistics in columns, depending if type is:

'summary':
- 'statN': number of subjects
- 'statm': number of records
- 'statMean': mean of var
- 'statSD': standard deviation of var
- 'statSE': standard error the mean of var
- 'statMedian': median of var
- 'statMin': minimum of var
- 'statMax': maximum of var
'count':
- 'variableGroup': factor with groups of var for which counts are reported
- 'statN': number of subjects
- 'statm': number of records

Arguments

data

Data.frame with dataset to consider for the summary table.

var

Character vector with variable(s) of data, to compute statistics on.
If NULL (by default), counts by row/column variable(s) are computed.
To also return counts of the rowVar in case other var are specified, you can include: 'all' in the var.
Missing values, if present, are filtered (also for the report of number of subjects/records).

varTotalInclude

Logical (FALSE by default) Should the total across all categories of var be included for the count table? Only used if var is a categorical variable.

statsExtra

(optional) Named list with functions for additional custom statistics to be computed.
Each function:

has as parameter, either: 'x': the variable (var) to compute the summary statistic on or 'data': the entire dataset
returns the corresponding summary statistic as a numeric vector

For example, to additionally compute the coefficient of variation, this can be set to: list(statCVPerc = function(x) sd(x)/mean(x)*100) (or cv).

subjectVar

String, variable of data with subject ID, 'USUBJID' by default.

filterEmptyVar

Logical, if TRUE doesn't return any results if the variable is empty, otherwise return 0 for the counts and NA for summary statistics. Criterias to consider a variable empty are:

for a continuous variable: all missing (NA)
for a categorical variable: all missing or **category is included in the factor levels but not available in data**

By default, an empty variable are filtered.

type

String with type of table:

'summaryTable': summary table with statistics for numeric variable
'countTable': count table
'auto' (by default): 'summaryTable' if the variable is numeric, 'countTable' otherwise

checkVarDiffBySubj

String, 'error' (default), 'warning', or 'none'. Should an error, a warning, or nothing be produced if a continuous variable (var) contains different values for the same subject?

msgLabel

(optional) String with label for the data (NULL by default), included in the message/warning for checks.

msgVars

(optional) Character vector with columns of data containing extra variables (besides var and subjectVar) that should be included in the message/warning for checks.

Author

Laure Cougnaud