cat_contrast: Calculate the frequency of discrete values in one categorical variable for each of two mutually exclusive groups within another categorical variable

Description

This function shows the distrbution of values within given a categorical variable for one group within another categorical variable, and compares it with the distribution among all observations not in that group. Its purpose is to let you see quickly whether the distribution within that group differs from the distribution for the rest of the observations. The results are sorted in descending order of frequency for the named group i.e. the group named in col_group.

Usage

cat_contrast(
  data,
  row_cat,
  col_cat,
  col_group,
  na.rm.row = FALSE,
  na.rm.col = FALSE,
  na.rm = NULL,
  only = "",
  clean_names = getOption("tabbycat.clean_names"),
  na_label = getOption("tabbycat.na_label"),
  other_label = getOption("tabbycat.other_label")
)

Value

A tibble showing the distribution of row_cat within each of the two exclusive groups in col_cat.

Arguments

data: A dataframe containing the two variables of interest.
row_cat: The column name of a categorical variable whose distribution should be calculated for each exclusive group in col_cat.
col_cat: The column name of a categorical variable that will be split into two exclusive groups, one containing observations with a particular value of that variable, and another containing all other observations.
col_group: The name of the group within col_cat that is used to split the observations into two exclusive groups: those that are in the group and those that are not in the group.
na.rm.row: A boolean indicating whether to exclude NAs from the row results. The default is FALSE.
na.rm.col: A boolean indicating whether to exclude NAs from the column results. The default is FALSE.
na.rm: A boolean indicating whether to exclude NAs from both row and column results. This argument is provided as a convenience. It allows you to set na.rm.row and na.rm.col to the same value without having to specify them separately. If the value of na.rm is NULL, the argument is ignored. If it is not NULL it takes precendence. default is NULL.
only: A string indicating that only one set of frequency columns should be returned in the results. If only is either "n" or "number", only the number columns are returned. If only is either "p" or "percent", only the percent columns are returned. If only is any other value, both sets of columns are shown. The default value is an empty string, which means both sets of columns are shown.
clean_names: A boolean indicating whether the column names of the results tibble should be cleaned, so that any column names produced from data are converted to snake_case. The default is TRUE, but this can be changed with options(tabbycat.clean_names = FALSE).
na_label: A string indicating the label to use for the columns that contain data for missing values. The default value is "na", but use this argument to set a different value if the default value collides with data in your dataset.
other_label: A string indicating the label to use for the columns that contain data for observations not in the named group. The default value is "other", but use this argument to set a different value if the default value collides with data in your dataset.