getSummaryStatisticsTable: Get summary statistics table

Description

Get summary statistics table

Usage

getSummaryStatisticsTable(
  data,
  var = NULL,
  varFlag = NULL,
  varLab = NULL,
  varLabInclude = length(var) > 1,
  varInclude0 = FALSE,
  varIgnore = NULL,
  varGeneralLab = "Variable",
  varSubgroupLab = "Variable group",
  varIncludeTotal = FALSE,
  varTotalInclude = FALSE,
  varTotalInSepRow = FALSE,
  rowVar = NULL,
  rowVarLab = NULL,
  rowVarDataLevels = NULL,
  rowOrder = "auto",
  rowOrderTotalFilterFct = NULL,
  rowOrderCatLast = NULL,
  rowVarInSepCol = NULL,
  rowVarFormat = NULL,
  rowVarTotalInclude = NULL,
  rowVarTotalByVar = NULL,
  rowVarTotalInSepRow = NULL,
  rowTotalLab = NULL,
  rowInclude0 = FALSE,
  rowAutoMerge = TRUE,
  emptyValue = "-",
  rowVarTotalPerc = NULL,
  colVar = NULL,
  colVarTotal = colVar,
  colVarTotalPerc = colVarTotal,
  colInclude0 = FALSE,
  colVarDataLevels = NULL,
  colTotalInclude = FALSE,
  colTotalLab = "Total",
  stats = NULL,
  statsExtra = NULL,
  statsVarBy = NULL,
  statsPerc = c("statN", "statm"),
  statsGeneralLab = "Statistic",
  statsValueLab = "StatisticValue",
  statsLabInclude = NULL,
  subjectVar = "USUBJID",
  filterFct = NULL,
  dataTotal = NULL,
  dataTotalPerc = dataTotal,
  dataTotalRow = NULL,
  dataTotalCol = NULL,
  type = "auto",
  byVar = NULL,
  byVarLab = NULL,
  checkVarDiffBySubj = "error",
  labelVars = NULL,
  outputType = "flextable",
  statsLayout = ifelse("DT" %in% outputType, "col", "row"),
  landscape = (style == "presentation"),
  margin = 1,
  rowPadBase = 14.4,
  title = NULL,
  footer = NULL,
  file = NULL,
  style = "report",
  colorTable = getColorPaletteTable(style = style),
  colHeaderTotalInclude = TRUE,
  colHeaderMerge = TRUE,
  fontsize = switch(style, report = 8, presentation = 10),
  fontname = switch(style, report = "Times", presentation = "Tahoma"),
  vline = "none",
  hline = "auto",
  pageDim = NULL,
  columnsWidth = NULL,
  expandVar = NULL,
  noEscapeVar = NULL,
  barVar = NULL,
  ...
)

Value

Depending on the outputType:

'data.frame-base': input summary table in a long format with all computed statistics
'data.frame': summary table in a wide format ( different columns for each colVar), with specified labels
'flextable' (by default): flextable object with summary table
'DT': datatable object with summary table

If multiple outputType are specified, a list of those objects, named by outputType.

If byVar is specified, each object consists of a list of tables, one for each element in byVar.

Arguments

data

Data.frame with dataset to consider for the summary table.

var

Character vector with variable(s) of data, to compute statistics on.
If NULL (by default), counts by row/column variable(s) are computed.
To also return counts of the rowVar in case other var are specified, you can include: 'all' in the var.
Missing values, if present, are filtered (also for the report of number of subjects/records).

varFlag

Character vector, subset of var with variable(s) of type 'flag' (with 'Y', 'N' or '' for empty/non specified value). Only the counts for records flagged (with 'Y') are retained.

varLab

Named character vector with label for each variable specified in var. By default, extracted from the labelVars. if not available, var is used.

varLabInclude

Logical, if TRUE the name of the summary statistic variable(s) (var) are included in the table. This is automatically set to TRUE if more than one variable(s) and is specified, and FALSE if only one variable is specified.

varInclude0

Logical, should rows with no counts for the count var or varFlag variable(s) be included in the table? Either:

logical of length 1, if TRUE (FALSE by default) rows with no count are included for all var
a character vector containing categorical var for which zero counts rows should be included

varIgnore

Vector with elements to ignore in the var variable(s). The data records with such elements in var are filtered from the data at the start of the workflow.

varGeneralLab

String with general label for variable specified in var. In case of multiple variable in var, this will be included in the table header (see 'rowVarLab' attribute of the output).

varSubgroupLab

String with general label for sub-group of categorical variable(s) for count table, 'Variable group' by default. This will be included in the final table header (see 'rowVarLab' attribute of the output).

varIncludeTotal

This argument is deprecated, please use: 'varTotalInclude' instead.

varTotalInclude

Should the total across all categories of var be included for the count table? Only used for categorical variables (and var not 'all'). Either:

logical of length 1, if TRUE (FALSE by default) include the total for all categorical var
a character vector containing categorical var for which the total should be included

varTotalInSepRow

Logical, should the total per variable be included in a separated row (by default) or in the row containing the header of the variable?

rowVar

Character vector with variable(s) to be included in the rows. If multiple variables are specified, the variables should be sorted in hierarchical order (e.g. body system class before adverse event term) and are nested in the table.

rowVarLab

Named character vector with label for the rowVar variable(s).

rowVarDataLevels

Data.frame with unique combinations of rowVar to be included in columns. Each column should correspond to colVar and as factor if the elements should be ordered in the final table.

rowOrder

Specify how the rows should be ordered in the final table, either a:

String among:
- 'auto' (by default): if the variable is a factor, keep its order, otherwise order alphabetically
- 'alphabetical': order alphabetically
- 'total': order rows in decreasing order of the total number of subjects across all columns for this specific category.
Function with input the summary table and output the ordered elements of the rowVar

To specify different ordering methods for different rowVar, specify a list of such elements, named with the rowVar variable. For the table output of computeSummaryStatisticsTable (long format), this order is also reflected in the levels of the row factor variable.

rowOrderTotalFilterFct

Function used to filter the data used to order the rows based on total counts (in case rowOrder is 'total'), To order rows based on one specific column category, e.g. to order based on the counts in the treatment column: function(x) subset(x, TRTP == "treatmentX")

rowOrderCatLast

String with category to be printed in the last row of each rowVar (if any, set to NULL if none).

rowVarInSepCol

Character vector with rowVar that should be included in separated columns. By default (NULL), all row variables are nested in the first column of the table.
To include the groups within a var variable in a separated column, set: rowVarInSepCol == 'variableGroup'.

rowVarFormat

(flextable output) Named list with special formatting for the rowVar. Currently, only possibility is to set the variable elements in bold, with: list(var1 = "bold"). (Use 'variable' for var or 'variableGroup' for group within categorical variables.)

rowVarTotalInclude

Character vector with rowVar for which the total should be reported.
If the higher row variable is specified, the total across all rows is reported.
For the export, these variable(s) are formatted as factor with 'Total' as the first level.

rowVarTotalByVar

Character vector with a row variable used to categorize the row total.
Note that this is only used if row total(s) is/are requested via rowVarTotalInclude, and this variable should also be included in rowVar. This can be specified also for a specific row variable if the vector is named.
For example: c(ADECOD = "AESEV") to compute total by severity for row adverse event term in a typical adverse event count table (by System Organ Class and Adverse Event Term).

rowVarTotalInSepRow

Character vector with rowVarTotalInclude (not in rowVarInSepCol) for which the total should be included in a separated row labelled 'Total'. Otherwise (by default) the total is included in the header row of each category.

rowTotalLab

(flextable output) string with label for the row with total.

rowInclude0

Logical, if TRUE (FALSE by default), include rows with no records, based on all combinations of the rowVar (assuming nested variable(s)).

rowAutoMerge

(flextable output) Logical, if TRUE (by default) automatically merge rows, e.g. in case there is only one sub-category (e.g. categorical variable with only one group) or only one statistic per category.

emptyValue

String with placeholder used to fill the table for missing values, '-' by default. This value is typically used e.g. if not all statistics are computed for all specified row/col/var variables.

rowVarTotalPerc

Character vector with row variables by which the total should be computed for the denominator for the percentage computation. By default the total is only computed only by column (NULL by default). If the total should be based on the total number of records per variable, rowVarTotalPerc should be set to 'variable'.

colVar

Character vector with variable(s) to be included in columns. If multiple variables are specified, the variables should be sorted in hierarchical order, and are included in multi-columns layout.
Use: 'variable' to include the variables to summarize: var (if multiple) in different columns.

colVarTotal

String with column(s) considered to compute the total by, reported in the header of the table, by default same as colVar. Use: 'variable' to compute total by var (if multiple).

colVarTotalPerc

String with column(s) considered to compute the total by, used as denominator for the percentage computation, by default same as colVarTotal. Use: 'variable' to compute total by var (if multiple).

colInclude0

Logical, if TRUE (FALSE by default), include columns with no records, based on all combinations of the columnVar (assuming nested variable(s)). If variable(s) are not nested, possible combinations can be specified via colVarDataLevels.

colVarDataLevels

Data.frame with unique combinations of colVar to be included in columns. Each column should correspond to colVar and as factor if the elements should be ordered in the final table.

colTotalInclude

Logical, if TRUE (FALSE by default) include the summary statistics across columns in a separated column.

colTotalLab

String, label for the total column 'Total' by default.

stats

(optional) Statistic(s) of interest to compute, either:

string with the name of a default set of statistics available in the package, see section 'Formatted statistics' in in-text table statistics.
See the corresponding type parameter of the getStatsData for more information on how the statistic is internally extracted.
(expert mode) named list of language object (see is.language) of base summary statistics of interest, see section: 'Base statistics' in in-text table statistics.
The names are reported in the header.
If stats if of length 1, the name of the summary statistic is not included in the table.
The statistics can be specified separately:
- for each var (if multiple), by naming each element of the list: list(varName1 = list(...), varName2 = list())
- and/or for each element in: statsVarBy, by naming each sublist.

statsExtra

(optional) Named list with functions for additional custom statistics to be computed.
Each function:

has as parameter, either: 'x': the variable (var) to compute the summary statistic on or 'data': the entire dataset
returns the corresponding summary statistic as a numeric vector

For example, to additionally compute the coefficient of variation, this can be set to: list(statCVPerc = function(x) sd(x)/mean(x)*100) (or cv).

statsVarBy

String with variable in rowVar/colVar which the statistics should be computed by.
In this case, stats (nested list or not) should be additionally nested to specify the statistics for each element in statsVarBy.

statsPerc

String with 'base statistical variable' used to compute the percentage, either:

'statN' (by default): the number of subjects
'statm': the number of records

statsGeneralLab

String with general label for statistics, 'Statistic' by default. Only included if no statsVar if longer than 1.

statsValueLab

String with label for the statistic value, 'StatisticValue' by default.
This is only included in the table if the statistics provided in stats are not named and if no colVar is specified.

statsLabInclude

Logical, if TRUE include the statistic label in the table.
By default only included if more than one statistic variables are available in the table.

subjectVar

String, variable of data with subject ID, 'USUBJID' by default.

filterFct

(optional) Function taking as input the summary table with computed statistics and returning a subset of the summary table.
Note: The filtering function should also handle records with :

total for the column header: isTotal set to TRUE, and colVar/rowVar is NA.
For example: filterFct = function(data) subset(data, isTotal & myColVar == "group 1")
rowVar/colVar set to 'Total'/colTotalLab if rowVarTotalInclude/colTotalInclude is specified

dataTotal

Data.frame used to extract the Total number of subject per column in column header ('N = [X]'). It should contain the variables specified by colVarTotal. If not specified, the total number of subjects is extracted from the data.

dataTotalPerc

Data.frame used to extract the total counts per column for the computation of the percentage.
By default, dataTotal is used.
It should contain the variables specified by colVarTotalPerc.

dataTotalRow

Data.frame used to extract the total count across all elements of the row variable, list of such data.frame for each rowVar variable.
If the dataset is specified by row variable, the list should be named with: variable X if the total across elements of variable X should be included. By default, data is used.

dataTotalCol

Data.frame from which the total across columns is extracted (in case colTotalInclude is TRUE) or list of such data.frame for each rowVar variable.
If the dataset is specified by row variable, the list should be named with: with:

last row variable: for the dataset used in the total column for the most nested row variable
higher row variable (X+1): for the dataset used for the total column and row total of X
'total': for the dataset used for the total column and general row total

If only a subset of the variables is specified in this list, data is used for the remaining variable(s) (or 'total') if needed.
This dataset (the one for 'total' if a list) is also used for:

the header of the total column in case dataTotal is not specified
the denominator of the percentages in the total column in case dataTotalPerc is not specified

By default, data is used.

type

String with type of table:

'summaryTable': summary table with statistics for numeric variable
'countTable': count table
'auto' (by default): 'summaryTable' if the variable is numeric, 'countTable' otherwise

byVar

Variable(s) of data for which separated table(s) should be created.

byVarLab

String with label for byVar, used to set the names of the output list of table(s).

checkVarDiffBySubj

String, 'error' (default), 'warning', or 'none'. Should an error, a warning, or nothing be produced if a continuous variable (var) contains different values for the same subject (by row/column)?

labelVars

(optional) Named character vector with label for the row, column variable(s) or variable(s) to summarize.
Labels specified via dedicated parameter: e.g. rowVarLab, colVarLab, varLab have priority on this parameter.

outputType

String with output type:

'flextable' (by default): flextable object, with format for CSR, compatible with Word/PowerPoint export
'DT': datatable interactive table, compatible with html export
'data.frame': data.frame in wide format (with elements in colVar in different columns)
'data.frame-base': data.frame in long format (with elements in colVar in different rows), useful for QC

statsLayout

String with layout for the statistics names (in case more than one statistic is included), among:

row (by default for 'flextable' output):
All statistics are included in different rows in the first column of the table (after the row variable(s))
'col' (by default for 'DT' output):
Statistics are included in separated columns (last row of the header).
This option is not compatible with categorical variable(s).
'rowInSepCol':
Statistics are included in different rows, but in a separated column than the rowVar variable(s)

landscape

(flextable output) Logical, if TRUE the file is in landscape format.
By default: FALSE if style is 'report' and TRUE if style is 'presentation'.

margin

(flextable output) Margin in the document in inches (1 by default). This is used to specify the width of the table, from: [pageDim[1] - 2 * margin].

rowPadBase

(flextable output) Base padding for row (in points), 14.4 by default (corresponds to 0.2 inches)

title

Character vector with title(s) for the table. Set to NULL (by default) if no title should be included. If multiple are specified, specified for each element of byVar (in order of the levels).

footer

(flextable output) Character vector with footer(s) for the table. Set to NULL (by default) if no footer should be included.

file

(Optional) Name of the file the table should be exported to, either:

string (of length 1). In this case, depending on the file extension, the following is exported:
- 'txt': summary table in long format ('data.frame-base' outputType)
- 'docx': summary table in final format is exported ('flextable' outputType)
- 'html': interactive summary table is exported ('DT' outputType)
named character vector in case of multiple exports. The names should correspond to the options in outputType:
- for 'data.frame-base' and 'data.frame': filename with 'txt' extension
- for 'flextable': filename with 'docx' extension
- for 'DT': filename with 'html' extension

If NULL (by default), the summary table is not exported but only returned as output. If byVar is specified, each table is exported to a separated file with the suffix: 'file_[i].[ext]' with i the index of the file (and [ext] the file extension).

style

(flextable output) String with table style, either 'report' or 'presentation'. This parameter affects the fontsize, font family, color of the text and background, and table dimensions of the table.

colorTable

(flextable output) Named character vector with color for the table background/body/text/line, e.g. created with the getColorPaletteTable function.

colHeaderTotalInclude

Logical, if TRUE include the total of number of patients ('statN') in the column header.

colHeaderMerge

(flextable output) Logical, if TRUE (FALSE by default) the column header is merged.

fontsize

(flextable output) Integer with font size, by default: 8 if style is 'report' and 10 if style is 'presentation'.

fontname

(flextable output) String with font name, by default: 'Times' if style is 'report' and 'Tahoma' if style is 'presentation'.

vline

(flextable output) String mentioning how vertical lines should be included in the body of the table, either:

'none' (default): no vertical lines included
'auto': vertical lines included between sub-groups

hline

(flextable output) String mentioning how horizontal lines should be included in the body of the table, either:

'none': no horizontal lines included
'auto' (default): horizontal lines included between sub-groups

pageDim

Numeric vector of length 2 with page width and height.
Depending on outputType:

'flextable': in inches
'DT': in number of rows in the table.
Currently only the height is used (e.g. c(NA, 4))

columnsWidth

(expert mode) Column widths of the table. This is only used for flextable and DT tables.
For flextable, note that the widths should be set to fit into the document page (see getDimPage).

expandVar

(DT output) Character vector with variables of the summary table which should be expanded in the data.

noEscapeVar

(DT output) Character vector with variables of summaryTable which shouldn't be escaped in the table (e.g. containing URLs).

barVar

(DT output) Character vector with variables of summaryTable that should be represented as a bar.

...

(DT output) Extra parameters passed to the getClinDT

Author

Laure Cougnaud