`significance_cpct`

conducts z-tests between column percent in the result of cro_cpct. Results are calculated with the same formula as in prop.test without continuity correction.`significance_means`

conducts t-tests between column means in the result of cro_mean_sd_n. Results are calculated with the same formula as in t.test.`significance_cases`

conducts chi-squared tests on the subtable of table with counts in the result of cro_cases. Results are calculated with the same formula as in chisq.test.`significance_cell_chisq`

compute cell chi-square test on table with column percent. The cell chi-square test looks at each table cell and tests whether it is significantly different from its expected value in the overall table. For example, if it is thought that variations in political opinions might depend on the respondent's age, this test can be used to detect which cells contribute significantly to that dependence. Unlike the chi-square test (`significance_cases`

), which is carried out on a whole set of rows and columns, the cell chi-square test is carried out independently on each table cell. Although the significance level of the cell chi-square test is accurate for any given cell, the cell tests cannot be used instead of the chi-square test carried out on the overall table. Their purpose is simply to point to the parts of the table where dependencies between row and column categories may exist.

For `significance_cpct`

and `significance_means`

there are three
type of comparisons which can be conducted simultaneously (argument
`compare_type`

):

`subtable`

provide comparisons between all columns inside each subtable.`previous_column`

is a comparison of each column of the subtable with the previous column. It is useful if columns are periods or survey waves.`first_column`

provides comparison the table first column with all other columns in the table.`adjusted_first_column`

is also comparison with the first column but with adjustment for common base. It is useful if the first column is total column and other columns are subgroups of this total. Adjustments are made according to algorithm in IBM SPSS Statistics Algorithms v20, p. 263. Note that with these adjustments t-tests between means are made with equal variance assumed (as with`var_equal = TRUE`

).

By now there are no adjustments for multiple-response variables (results of mrset) in the table columns so significance tests are rather approximate for such cases. Also, there are functions for the significance testing in the sequence of custom tables calculations (see tables):

`tab_last_sig_cpct`

,`tab_last_sig_means`

and`tab_last_sig_cpct`

make the same tests as their analogs mentioned above. It is recommended to use them after appropriate statistic function: tab_stat_cpct, tab_stat_mean_sd_n and tab_stat_cases.`tab_significance_options`

With this function we can set significance options for the entire custom table creation sequence.`tab_last_add_sig_labels`

This function applies`add_sig_labels`

to the last calculated table - it adds labels (letters by default) for significance to columns header. It may be useful if you want to combine a table with significance with table without it.`tab_last_round`

This function rounds numeric columns in the last calculated table to specified number of digits. It is sometimes needed if you want to combine table with significance with table without it.

```
tab_significance_options(
data,
sig_level = 0.05,
min_base = 2,
delta_cpct = 0,
delta_means = 0,
correct = TRUE,
compare_type = "subtable",
bonferroni = FALSE,
subtable_marks = "greater",
inequality_sign = "both" %in% subtable_marks,
sig_labels = LETTERS,
sig_labels_previous_column = c("v", "^"),
sig_labels_first_column = c("-", "+"),
sig_labels_chisq = c("<", "="">"),
keep = c("percent", "cases", "means", "sd", "bases"),
row_margin = c("auto", "sum_row", "first_column"),
total_marker = "#",
total_row = 1,
digits = get_expss_digits(),
na_as_zero = FALSE,
var_equal = FALSE,
mode = c("replace", "append")
)
```tab_last_sig_cpct(
data,
sig_level = 0.05,
delta_cpct = 0,
min_base = 2,
compare_type = "subtable",
bonferroni = FALSE,
subtable_marks = c("greater", "both", "less"),
inequality_sign = "both" %in% subtable_marks,
sig_labels = LETTERS,
sig_labels_previous_column = c("v", "^"),
sig_labels_first_column = c("-", "+"),
keep = c("percent", "bases"),
na_as_zero = FALSE,
total_marker = "#",
total_row = 1,
digits = get_expss_digits(),
mode = c("replace", "append"),
label = NULL
)

tab_last_sig_means(
data,
sig_level = 0.05,
delta_means = 0,
min_base = 2,
compare_type = "subtable",
bonferroni = FALSE,
subtable_marks = c("greater", "both", "less"),
inequality_sign = "both" %in% subtable_marks,
sig_labels = LETTERS,
sig_labels_previous_column = c("v", "^"),
sig_labels_first_column = c("-", "+"),
keep = c("means", "sd", "bases"),
var_equal = FALSE,
digits = get_expss_digits(),
mode = c("replace", "append"),
label = NULL
)

tab_last_sig_cases(
data,
sig_level = 0.05,
min_base = 2,
correct = TRUE,
keep = c("cases", "bases"),
total_marker = "#",
total_row = 1,
digits = get_expss_digits(),
mode = c("replace", "append"),
label = NULL
)

tab_last_sig_cell_chisq(
data,
sig_level = 0.05,
min_base = 2,
subtable_marks = c("both", "greater", "less"),
sig_labels_chisq = c("<", "="">"),
correct = TRUE,
keep = c("percent", "bases", "none"),
row_margin = c("auto", "sum_row", "first_column"),
total_marker = "#",
total_row = 1,
total_column_marker = "#",
digits = get_expss_digits(),
mode = c("replace", "append"),
label = NULL
)

tab_last_round(data, digits = get_expss_digits())

tab_last_add_sig_labels(data, sig_labels = LETTERS)

significance_cases(
x,
sig_level = 0.05,
min_base = 2,
correct = TRUE,
keep = c("cases", "bases"),
total_marker = "#",
total_row = 1,
digits = get_expss_digits()
)

significance_cell_chisq(
x,
sig_level = 0.05,
min_base = 2,
subtable_marks = c("both", "greater", "less"),
sig_labels_chisq = c("<", "="">"),
correct = TRUE,
keep = c("percent", "bases", "none"),
row_margin = c("auto", "sum_row", "first_column"),
total_marker = "#",
total_row = 1,
total_column_marker = "#",
digits = get_expss_digits()
)

cell_chisq(cases_matrix, row_base, col_base, total_base, correct)

significance_cpct(
x,
sig_level = 0.05,
delta_cpct = 0,
min_base = 2,
compare_type = "subtable",
bonferroni = FALSE,
subtable_marks = c("greater", "both", "less"),
inequality_sign = "both" %in% subtable_marks,
sig_labels = LETTERS,
sig_labels_previous_column = c("v", "^"),
sig_labels_first_column = c("-", "+"),
keep = c("percent", "bases"),
na_as_zero = FALSE,
total_marker = "#",
total_row = 1,
digits = get_expss_digits()
)

add_sig_labels(x, sig_labels = LETTERS)

significance_means(
x,
sig_level = 0.05,
delta_means = 0,
min_base = 2,
compare_type = "subtable",
bonferroni = FALSE,
subtable_marks = c("greater", "both", "less"),
inequality_sign = "both" %in% subtable_marks,
sig_labels = LETTERS,
sig_labels_previous_column = c("v", "^"),
sig_labels_first_column = c("-", "+"),
keep = c("means", "sd", "bases"),
var_equal = FALSE,
digits = get_expss_digits()
)

data

data.frame/intermediate_table for `tab_*`

functions.

sig_level

numeric. Significance level - by default it equals to `0.05`

.

min_base

numeric. Significance test will be conducted if both
columns have bases greater or equal to `min_base`

. By default, it equals to `2`

.

delta_cpct

numeric. Minimal delta between percent for which we mark
significant differences (in percent points) - by default it equals to zero.
Note that, for example, for minimal 5 percent point difference
`delta_cpct`

should be equals 5, not 0.05.

delta_means

numeric. Minimal delta between means for which we mark significant differences - by default it equals to zero.

correct

logical indicating whether to apply continuity correction when
computing the test statistic for 2 by 2 tables. Only for
`significance_cases`

and `significance_cell_chisq`

. For details
see chisq.test. `TRUE`

by default.

compare_type

Type of compare between columns. By default, it is
`subtable`

- comparisons will be conducted between columns of each
subtable. Other possible values are: `first_column`

,
`adjusted_first_column`

and `previous_column`

. We can conduct
several tests simultaneously.

bonferroni

logical. `FALSE`

by default. Should we use Bonferroni
adjustment by the number of comparisons in each row?

subtable_marks

character. One of "greater", "both" or "less". By
deafult we mark only values which are significantly greater than some other
columns. For `significance_cell_chisq`

default is "both".We can change
this behavior by setting an argument to `less`

or `both`

.

inequality_sign

logical. FALSE if `subtable_marks`

is "less" or
"greater". Should we show `>`

or `<`

before significance marks of
subtable comparisons.

sig_labels

character vector. Labels for marking differences between columns of subtable.

sig_labels_previous_column

a character vector with two elements. Labels
for marking a difference with the previous column. First mark means 'lower' (by
default it is `v`

) and the second means greater (`^`

).

sig_labels_first_column

a character vector with two elements. Labels
for marking a difference with the first column of the table. First mark means
'lower' (by default it is `-`

) and the second means 'greater'
(`+`

).

sig_labels_chisq

a character vector with two labels
for marking a difference with row margin of the table. First mark means
'lower' (by default it is `<`

) and the second means 'greater'
(`>`

). Only for `significance_cell_chisq`

.

keep

character. One or more from "percent", "cases", "means", "bases", "sd" or "none". This argument determines which statistics will remain in the table after significance marking.

row_margin

character. One of values "auto" (default), "sum_row", or
"first_column". If it is "auto" we try to find total column in the subtable
by `total_column_marker`

. If the search is failed, we use the sum of
each rows as row total. With "sum_row" option we always sum each row to get
margin. Note that in this case result for multiple response variables in
banners may be incorrect. With "first_column" option we use table first
column as row margin for all subtables. In this case result for the
subtables with incomplete bases may be incorrect. Only for
`significance_cell_chisq`

.

total_marker

character. Total rows mark in the table. "#" by default.

total_row

integer/character. In the case of the several totals per subtable it is a number or name of total row for the significance calculation.

digits

an integer indicating how much digits after decimal separator will be shown in the final table.

na_as_zero

logical. `FALSE`

by default. Should we treat
`NA`

's as zero cases?

var_equal

a logical variable indicating whether to treat the two variances as being equal. For details see t.test.

mode

character. One of `replace`

(default) or `append`

. In
the first case the previous result in the sequence of table calculation
will be replaced with result of significance testing. In the second case
result of the significance testing will be appended to sequence of table
calculation.

label

character. Label for the statistic in the `tab_*`

. Ignored
if the `mode`

is equals to `replace`

.

total_column_marker

character. Mark for total columns in the subtables. "#" by default.

x

table (class `etable`

): result of cro_cpct with
proportions and bases for `significance_cpct`

, result of
cro_mean_sd_n with means, standard deviations and valid N for
`significance_means`

, and result of cro_cases with counts and
bases for `significance_cases`

.

cases_matrix

numeric matrix with counts size R*C

row_base

numeric vector with row bases, length R

col_base

numeric vector with col bases, length C

total_base

numeric single value, total base

`tab_last_*`

functions return objects of class
`intermediate_table`

. Use tab_pivot to get the final result -
`etable`

object. Other functions return `etable`

object with
significant differences.

cro_cpct, cro_cases, cro_mean_sd_n, tables, compare_proportions, compare_means, prop.test, t.test, chisq.test

# NOT RUN { data(mtcars) mtcars = apply_labels(mtcars, mpg = "Miles/(US) gallon", cyl = "Number of cylinders", disp = "Displacement (cu.in.)", hp = "Gross horsepower", drat = "Rear axle ratio", wt = "Weight (lb/1000)", qsec = "1/4 mile time", vs = "Engine", vs = c("V-engine" = 0, "Straight engine" = 1), am = "Transmission", am = c("Automatic" = 0, "Manual"=1), gear = "Number of forward gears", carb = "Number of carburetors" ) mtcars_table = calculate(mtcars, cro_cpct(list(cyl, gear), list(total(), vs, am)) ) significance_cpct(mtcars_table) # } # NOT RUN { # comparison with first column significance_cpct(mtcars_table, compare_type = "first_column") # comparison with first column and inside subtable significance_cpct(mtcars_table, compare_type = c("first_column", "subtable")) # only significance marks significance_cpct(mtcars_table, keep = "none") # means mtcars_means = calculate(mtcars, cro_mean_sd_n(list(mpg, wt, hp), list(total(), vs, cyl)) ) significance_means(mtcars_means) # mark values which are less and greater significance_means(mtcars_means, subtable_marks = "both") # chi-squared test mtcars_cases = calculate(mtcars, cro_cases(list(cyl, gear), list(total(), vs, am)) ) significance_cases(mtcars_cases) # cell chi-squared test # increase number of cases to avoid warning about chi-square approximation mtcars2 = add_rows(mtcars, mtcars, mtcars) tbl = calc_cro_cpct(mtcars2, gear, am) significance_cell_chisq(tbl) # table with multiple variables tbl = calc_cro_cpct(mtcars2, list(gear, cyl), list(total(), am, vs)) significance_cell_chisq(tbl, sig_level = .0001) # custom tables with significance mtcars %>% tab_significance_options(subtable_marks = "both") %>% tab_cells(mpg, hp) %>% tab_cols(total(), vs, am) %>% tab_stat_mean_sd_n() %>% tab_last_sig_means(keep = "means") %>% tab_cells(cyl, gear) %>% tab_stat_cpct() %>% tab_last_sig_cpct() %>% tab_pivot() # Overcomplicated examples - we move significance marks to # separate columns. Columns with statistics remain numeric mtcars %>% tab_significance_options(keep = "none", sig_labels = NULL, subtable_marks = "both", mode = "append") %>% tab_cols(total(), vs, am) %>% tab_cells(mpg, hp) %>% tab_stat_mean_sd_n() %>% tab_last_sig_means() %>% tab_last_hstack("inside_columns") %>% tab_cells(cyl, gear) %>% tab_stat_cpct() %>% tab_last_sig_cpct() %>% tab_last_hstack("inside_columns") %>% tab_pivot(stat_position = "inside_rows") %>% drop_empty_columns() # }

Run the code above in your browser using DataCamp Workspace