diff_groups: Difference and Sum Groups

Description

This function is a wrapper around RowGroups() for the specific case where the input contains two columns. It calls RowGroups() with returnGroups = TRUE, and extends the resulting data frame of unique code combinations with additional information about common groups, difference groups, and sum groups.

Usage

diff_groups(
  x,
  ...,
  hiddenNA = TRUE,
  sep_common = "_=_",
  sep_diff = "_-_",
  sep_sum = c("_=_", "_+_"),
  outputNA = "NA"
)

Value

A list (as returned by RowGroups()), where the groups data frame is extended with additional descriptive columns indicating common, difference, and sum relationships between the two code columns.

Arguments

x: A data frame with exactly two columns.
...: Additional arguments passed to RowGroups().
hiddenNA: Logical. When TRUE (default), missing codes (NA) are treated as hidden categories — they are not available for computing difference and sum groups. See Note for details on how this differs from the NAomit parameter in RowGroups().
sep_common: A character string used in the common column to separate codes that are identical across the two input columns.
sep_diff: A character string used in the diff_1_2 and diff_2_1 columns to indicate difference groups. The first column contains the parent code, and one or more child codes from the other column are subtracted.
sep_sum: A character vector of one or two elements used in the sum_1_2 and sum_2_1 columns to describe relationships where a code in one column represents the sum of several codes in the other. The first element (sep_sum[1]) acts as an equality sign, and the second element (sep_sum[2]) acts as a plus sign. If sep_sum has length 1, the same value is used for both positions.
outputNA: Character string used to represent NA values within the newly constructed text strings in the additional output columns. Only relevant when hiddenNA = FALSE.

Details

The returned list contains the same elements as from RowGroups(), but with an extended groups data frame. The columns describe relationships between the two input columns as follows:

is_common — TRUE when the two codes on the row are identical.
is_child_1, is_child_2 — TRUE when the code in the column is a subset or subgroup of a code in the other column.
common — identical code pairs, formatted using sep_common.
diff_1_2, diff_2_1 — difference groups. The first element is the parent from the source column, followed by one or more child codes from the opposite column, joined using sep_diff.
sum_1_2, sum_2_1 — sum groups where a parent code in one column equals the sum of several codes in the other.

Examples

Run this code


df <- SSBtoolsData("code_pairs")

df

diff_groups(df)


d2 <- SSBtoolsData("d2")
diff_groups(d2[1:2])$groups
diff_groups(d2[2:3])$groups