This function shows the distrbution of values within given a categorical
variable for one group within another categorical variable, and compares it
with the distribution among all observations not in that group. Its purpose
is to let you see quickly whether the distribution within that group differs
from the distribution for the rest of the observations. The results are
sorted in descending order of frequency for the named group i.e. the group
named in col_group.
cat_contrast(
data,
row_cat,
col_cat,
col_group,
na.rm.row = FALSE,
na.rm.col = FALSE,
na.rm = NULL,
only = "",
clean_names = getOption("tabbycat.clean_names"),
na_label = getOption("tabbycat.na_label"),
other_label = getOption("tabbycat.other_label")
)A tibble showing the distribution of row_cat within each of
the two exclusive groups in col_cat.
A dataframe containing the two variables of interest.
The column name of a categorical variable whose distribution
should be calculated for each exclusive group in col_cat.
The column name of a categorical variable that will be split into two exclusive groups, one containing observations with a particular value of that variable, and another containing all other observations.
The name of the group within col_cat that is
used to split the observations into two exclusive groups: those that are
in the group and those that are not in the group.
A boolean indicating whether to exclude NAs from the row results. The default is FALSE.
A boolean indicating whether to exclude NAs from the column results. The default is FALSE.
A boolean indicating whether to exclude NAs from both row and
column results. This argument is provided as a convenience. It allows you
to set na.rm.row and na.rm.col to the same value without
having to specify them separately. If the value of na.rm is NULL,
the argument is ignored. If it is not NULL it takes precendence.
default is NULL.
A string indicating that only one set of frequency columns
should be returned in the results. If only is either "n" or
"number", only the number columns are returned. If only is either
"p" or "percent", only the percent columns are returned. If only is
any other value, both sets of columns are shown. The default value is an
empty string, which means both sets of columns are shown.
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with options(tabbycat.clean_names = FALSE).
A string indicating the label to use for the columns that contain data for missing values. The default value is "na", but use this argument to set a different value if the default value collides with data in your dataset.
A string indicating the label to use for the columns that contain data for observations not in the named group. The default value is "other", but use this argument to set a different value if the default value collides with data in your dataset.