cat_compare: Calculate the frequency of discrete values in one categorical variable for each group within another categorical variable

Description

This function crosstabulates the frequencies of one categorical variable within the groups of another. The results are sorted on the values of the variable whose distribution is shown in each column i.e. the variable specified with row_cat. If this variable is a character vector it will be sorted alphabetically. If it is a factor it will be sorted in the order of its levels.

Usage

cat_compare(
  data,
  row_cat,
  col_cat,
  na.rm.row = FALSE,
  na.rm.col = FALSE,
  na.rm = NULL,
  only = "",
  clean_names = getOption("tabbycat.clean_names"),
  na_label = getOption("tabbycat.na_label")
)

Value

A tibble showing the distribution of row_cat within each group in col_cat.

Arguments

data: A dataframe containing the two variables of interest.
row_cat: The column name of a categorical variable whose distribution will be calculated for each group in col_cat.
col_cat: The column name of a categorical variable which will be split into groups and the distrubtion of row_cat calulated for each group.
na.rm.row: A boolean indicating whether to exclude NAs from the row results. The default is FALSE.
na.rm.col: A boolean indicating whether to exclude NAs from the column results. The default is FALSE.
na.rm: A boolean indicating whether to exclude NAs from both row and column results. This argument is provided as a convenience. It allows you to set na.rm.row and na.rm.col to the same value without having to specify them separately. If the value of na.rm is NULL, the argument is ignored. If it is not NULL it takes precendence. default is NULL.
only: A string indicating that only one set of frequency columns should be returned in the results. If only is either "n" or "number", only the number columns are returned. If only is either "p" or "percent", only the percent columns are returned. If only is any other value, both sets of columns are shown. The default value is an empty string, which means both sets of columns are shown.
clean_names: A boolean indicating whether the column names of the results tibble should be cleaned, so that any column names produced from data are converted to snake_case. The default is TRUE, but this can be changed with options(tabbycat.clean_names = FALSE).
na_label: A string indicating the label to use for the columns that contain data for missing values. The default value is "na", but use this argument to set a different value if the default value collides with data in your dataset.