tables: Functions for custom tables construction

Description

Table construction consists of at least of three functions chained with magrittr pipe operator. At first we need to specify variables for which statistics will be computed with tab_cells. Secondary, we calculate statistics with one of tab_stat_* functions. And last, we finalize table creation with tab_pivot: dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot(). After that we can optionally sort table with tab_sort_asc, drop empty rows/columns with drop_rc and transpose with tab_transpose. Generally, table is just a data.frame so we can use arbitrary operations on it. Statistic is always calculated with the last cell, column/row variables, weight, missing values and subgroup. To define new cell/column/row variables we can call appropriate function one more time. tab_pivot defines how we combine different statistics and where statistic labels will appear - inside/outside rows/columns. See examples. For significance testing see significance.

Usage

tab_cols(data, ...)
tab_cells(data, ...)
tab_rows(data, ...)
tab_weight(data, weight = NULL)
tab_mis_val(data, ...)
tab_total_label(data, ...)
tab_total_statistic(data, ...)
tab_total_row_position(data, total_row_position = c("below", "above", "none"))
tab_subgroup(data, subgroup = NULL)
tab_row_label(data, ..., label = NULL)
tab_stat_fun(data, ..., label = NULL, unsafe = FALSE)
tab_stat_mean_sd_n(
  data,
  weighted_valid_n = FALSE,
  labels = c("Mean", "Std. dev.", ifelse(weighted_valid_n, "Valid N", "Unw. valid N")),
  label = NULL
)
tab_stat_mean(data, label = "Mean")
tab_stat_median(data, label = "Median")
tab_stat_se(data, label = "S. E.")
tab_stat_sum(data, label = "Sum")
tab_stat_min(data, label = "Min.")
tab_stat_max(data, label = "Max.")
tab_stat_sd(data, label = "Std. dev.")
tab_stat_valid_n(data, label = "Valid N")
tab_stat_unweighted_valid_n(data, label = "Unw. valid N")
tab_stat_fun_df(data, ..., label = NULL, unsafe = FALSE)
tab_stat_cases(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)
tab_stat_cpct(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)
tab_stat_cpct_responses(
  data,
  total_label = NULL,
  total_statistic = "u_responses",
  total_row_position = c("below", "above", "none"),
  label = NULL
)
tab_stat_tpct(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)
tab_stat_rpct(
  data,
  total_label = NULL,
  total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"),
  label = NULL
)
tab_last_vstack(
  data,
  stat_position = c("outside_rows", "inside_rows"),
  stat_label = c("inside", "outside"),
  label = NULL
)
tab_last_hstack(
  data,
  stat_position = c("outside_columns", "inside_columns"),
  stat_label = c("inside", "outside"),
  label = NULL
)
tab_pivot(
  data,
  stat_position = c("outside_rows", "inside_rows", "outside_columns", "inside_columns"),
  stat_label = c("inside", "outside")
)
tab_transpose(data)
tab_caption(data, ...)

Arguments

data

data.frame/intermediate_table

...

vector/data.frame/list. Variables for tables. Use mrset/mdset for multiple-response variables.

weight

numeric vector in tab_weight. Cases with NA's, negative and zero weights are removed before calculations.

total_row_position

Position of total row in the resulting table. Can be one of "below", "above", "none".

subgroup

logical vector in tab_subgroup. You can specify subgroup on which table will be computed.

label

character. Label for the statistic in the tab_stat_*.

unsafe

logical If TRUE than fun will be evaluated as is. It can lead to significant increase in the performance. But there are some limitations. For tab_stat_fun it means that your function fun should return vector of length one. Also there will be no attempts to make labels for statistic. For tab_stat_fun_df your function should return vector of length one or list/data.frame (optionally with 'row_labels' element - statistic labels). If unsafe is TRUE then further arguments (...) for fun will be ignored.

weighted_valid_n

logical. Sould we show weighted valid N in tab_stat_mean_sd_n? By default it is FALSE.

labels

character vector of length 3. Labels for mean, standard deviation and valid N in tab_stat_mean_sd_n.

total_label

By default "#Total". You can provide several names - each name for each total statistics.

total_statistic

By default it is "u_cases" (unweighted cases). Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.

stat_position

character one of the values "outside_rows", "inside_rows", "outside_columns" or "inside_columns". It defines how we will combine statistics in the table.

stat_label

character one of the values "inside" or "outside". Where will be placed labels for the statistics relative to column names/row labels? See examples.

Value

All of these functions return object of class intermediate_table except tab_pivot which returns final result - object of class etable. Basically it's a data.frame but class is needed for custom methods.

Details

tab_cells variables on which percentage/cases/summary functions will be computed. Use mrset/mdset for multiple-response variables.
tab_cols optional variables which breaks table by columns. Use mrset/mdset for multiple-response variables.
tab_rows optional variables which breaks table by rows. Use mrset/mdset for multiple-response variables.
tab_weight optional weight for the statistic.
tab_mis_val optional missing values for the statistic. It will be applied on variables specified by tab_cells. It works in the same manner as na_if.
tab_subgroup optional logical vector/expression which specify subset of data for table.
tab_row_label Add to table empty row with specified row labels. It is usefull for making section headings and etc.
tab_total_row_position Default value for total_row_position argument in tab_stat_cases and etc. Can be one of "below", "above", "none".
tab_total_label Default value for total_label argument in tab_stat_cases and etc. You can provide several names - each name for each total statistics.
tab_total_statistic Default value for total_statistic argument in tab_stat_cases and etc. You can provide several values. Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.
tab_stat_fun, tab_stat_fun_df tab_stat_fun applies function on each variable in cells separately, tab_stat_fun_df gives to function each data.frame in cells as a whole data.table with all names converted to variable labels (if labels exists). So it is not recommended to rely on original variables names in your fun. For details see cro_fun. You can provide several functions as arguments. They will be combined as with combine_functions. So you can use method argument. For details see documentation for combine_functions.
tab_stat_cases calculate counts.
tab_stat_cpct, tab_stat_cpct_responses calculate column percent. These functions give different results only for multiple response variables. For tab_stat_cpct base of percent is number of valid cases. Case is considered as valid if it has at least one non-NA value. So for multiple response variables sum of percent may be greater than 100. For tab_stat_cpct_responses base of percent is number of valid responses. Multiple response variables can have several responses for single case. Sum of percent of tab_stat_cpct_responses always equals to 100%.
tab_stat_rpct calculate row percent. Base for percent is number of valid cases.
tab_stat_tpct calculate table percent. Base for percent is number of valid cases.
tab_stat_mean, tab_stat_median, tab_stat_se, tab_stat_sum, tab_stat_min, tab_stat_max, tab_stat_sd, tab_stat_valid_n, tab_stat_unweighted_valid_n different summary statistics. NA's are always omitted.
tab_pivot finalize table creation and define how different tab_stat_* will be combined
tab_caption set caption on the table. Should be used after the tab_pivot.
tab_transpose transpose final table after tab_pivot or last statistic.

Examples

Run this code

# NOT RUN {
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)
# some examples from 'cro'
# simple example - generally with 'cro' it can be made with less typing
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# split rows
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# multiple banners
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs, am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# nested banners
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# summary statistics
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
    tab_pivot()

# summary statistics - labels in columns
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
    tab_pivot()

# subgroup with droping empty columns
mtcars %>% 
    tab_subgroup(am == 0) %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    drop_empty_columns()

# total position at the top of the table
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct(total_row_position = "above",
                  total_label = c("number of cases", "row %"),
                  total_statistic = c("u_cases", "u_rpct")) %>% 
    tab_pivot()

# this example cannot be made easily with 'cro'             
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_rows")

# statistic labels inside columns             
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_columns")

# stacked statistics
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct() %>% 
    tab_pivot()
    
# stacked statistics with section headings
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_row_label("#Summary statistics") %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_row_label("#Column percent") %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# stacked statistics with different variables
mtcars %>% 
    tab_cols(total(), am) %>% 
    tab_cells(mpg, hp, qsec) %>% 
    tab_stat_mean() %>%
    tab_cells(cyl, carb) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# stacked statistics - label position outside row labels
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct(label = "Col %") %>% 
    tab_pivot(stat_label = "outside")
    
# example from 'cro_fun_df' - linear regression by groups with sorting 
mtcars %>% 
    tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_fun_df(
        function(x){
            frm = reformulate(".", response = as.name(names(x)[1]))
            model = lm(frm, data = x)
            sheet('Coef.' = coef(model), 
                  confint(model)
            )
        }    
    ) %>% 
    tab_pivot() %>% 
    tab_sort_desc()

# multiple-response variables and weight
data(product_test)
codeframe_likes = num_lab("
                          1 Liked everything
                          2 Disliked everything
                          3 Chocolate
                          4 Appearance
                          5 Taste
                          6 Stuffing
                          7 Nuts
                          8 Consistency
                          98 Other
                          99 Hard to answer
                          ")

set.seed(1)
product_test = compute(product_test, {
    # recode age by groups
    age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2)
    
    var_lab(age_cat) = "Age"
    val_lab(age_cat) = c("18 - 25" = 1, "26 - 35" = 2)
    
    var_lab(a1_1) = "Likes. VSX123"
    var_lab(b1_1) = "Likes. SDF456"
    val_lab(a1_1) = codeframe_likes
    val_lab(b1_1) = codeframe_likes
    
    wgt = runif(.N, 0.25, 4)
    wgt = wgt/sum(wgt)*.N
})

product_test %>% 
    tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_stat_cpct() %>% 
    tab_sort_desc() %>% 
    tab_pivot()
    
# trick to place cell variables labels inside columns
# useful to compare two variables
# '|' is needed to prevent automatic labels creation from argument
# alternatively we can use list(...) to avoid this
product_test %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>% 
    tab_stat_cpct(label = var_lab(a1_1)) %>% 
    tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>% 
    tab_stat_cpct(label = var_lab(b1_1)) %>% 
    tab_pivot(stat_position = "inside_columns")

# if you need standard evaluation, use 'vars'
tables = mtcars %>%
      tab_cols(total(), am %nest% vs)

for(each in c("mpg", "disp", "hp", "qsec")){
    tables = tables %>% tab_cells(vars(each)) %>%
        tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) 
}
tables %>% tab_pivot()
# }

Run the code above in your browser using DataLab