computeSummaryStatisticsByRowColVar: Compute summary statistics by specified `rowVar` and `colVar`

Description

Compute summary statistics by specified rowVar and colVar

Usage

computeSummaryStatisticsByRowColVar(
  data,
  var = NULL,
  varLab = getLabelVar(var = var, data = data, labelVars = labelVars),
  varInclude0 = FALSE,
  varLabInclude = length(var) > 1,
  varTotalInclude = FALSE,
  type = "auto",
  rowVar = NULL,
  rowInclude0 = FALSE,
  rowVarDataLevels = NULL,
  colVar = NULL,
  colInclude0 = FALSE,
  colVarDataLevels = NULL,
  subjectVar = "USUBJID",
  labelVars = NULL,
  statsExtra = NULL,
  msgLabel = NULL,
  checkVarDiffBySubj = "error"
)

Value

data.frame of class 'countTable' or 'summaryTable', depending on the 'type' parameter; with statistics in columns, either if type is:

'summaryTable':
- 'N': number of subjects
- 'Mean': mean of var
- 'SD': standard deviation of var
- 'SE': standard error of var
- 'Median': median of var
- 'Min': minimum of var
- 'Max': maximum of var
- 'm': number of records
'countTable':
- 'N': number of subjects
- 'm': number of records

Arguments

data

Data.frame with dataset to consider for the summary table.

var

Character vector with variable(s) of data, to compute statistics on.
If NULL (by default), counts by row/column variable(s) are computed.
To also return counts of the rowVar in case other var are specified, you can include: 'all' in the var.
Missing values, if present, are filtered (also for the report of number of subjects/records).

varLab

Named character vector with label for each variable specified in var. By default, extracted from the labelVars. if not available, var is used.

varInclude0

Logical, should rows with no counts for the count var or varFlag variable(s) be included in the table? Either:

logical of length 1, if TRUE (FALSE by default) rows with no count are included for all var
a character vector containing categorical var for which zero counts rows should be included

varLabInclude

Logical, if TRUE the name of the summary statistic variable(s) (var) are included in the table. This is automatically set to TRUE if more than one variable(s) and is specified, and FALSE if only one variable is specified.

varTotalInclude

Should the total across all categories of var be included for the count table? Only used for categorical variables (and var not 'all'). Either:

logical of length 1, if TRUE (FALSE by default) include the total for all categorical var
a character vector containing categorical var for which the total should be included

type

String with type of table:

'summaryTable': summary table with statistics for numeric variable
'countTable': count table
'auto' (by default): 'summaryTable' if the variable is numeric, 'countTable' otherwise

rowVar

Character vector with variable(s) to be included in the rows. If multiple variables are specified, the variables should be sorted in hierarchical order (e.g. body system class before adverse event term) and are nested in the table.

rowInclude0

Logical, if TRUE (FALSE by default), include rows with no records, based on all combinations of the rowVar (assuming nested variable(s)).

rowVarDataLevels

Data.frame with unique combinations of rowVar to be included in columns. Each column should correspond to colVar and as factor if the elements should be ordered in the final table.

colVar

Character vector with variable(s) to be included in columns. If multiple variables are specified, the variables should be sorted in hierarchical order, and are included in multi-columns layout.
Use: 'variable' to include the variables to summarize: var (if multiple) in different columns.

colInclude0

Logical, if TRUE (FALSE by default), include columns with no records, based on all combinations of the columnVar (assuming nested variable(s)). If variable(s) are not nested, possible combinations can be specified via colVarDataLevels.

colVarDataLevels

Data.frame with unique combinations of colVar to be included in columns. Each column should correspond to colVar and as factor if the elements should be ordered in the final table.

subjectVar

String, variable of data with subject ID, 'USUBJID' by default.

labelVars

(optional) Named character vector with label for the row, column variable(s) or variable(s) to summarize.
Labels specified via dedicated parameter: e.g. rowVarLab, colVarLab, varLab have priority on this parameter.

statsExtra

(optional) Named list with functions for additional custom statistics to be computed.
Each function:

has as parameter, either: 'x': the variable (var) to compute the summary statistic on or 'data': the entire dataset
returns the corresponding summary statistic as a numeric vector

For example, to additionally compute the coefficient of variation, this can be set to: list(statCVPerc = function(x) sd(x)/mean(x)*100) (or cv).

checkVarDiffBySubj

String, 'error' (default), 'warning', or 'none'. Should an error, a warning, or nothing be produced if a continuous variable (var) contains different values for the same subject (by row/column)?

Author

Laure Cougnaud