Learn R Programming

SSBtools (version 1.8.4)

diff_groups: Difference and Sum Groups

Description

This function is a wrapper around RowGroups() for the specific case where the input contains two columns. It calls RowGroups() with returnGroups = TRUE, and extends the resulting data frame of unique code combinations with additional information about common groups, difference groups, and sum groups.

Usage

diff_groups(
  x,
  ...,
  hiddenNA = TRUE,
  sep_common = "_=_",
  sep_diff = "_-_",
  sep_sum = c("_=_", "_+_"),
  outputNA = "NA"
)

Value

A list (as returned by RowGroups()), where the groups data frame is extended with additional descriptive columns indicating common, difference, and sum relationships between the two code columns.

Arguments

x

A data frame with exactly two columns.

...

Additional arguments passed to RowGroups().

hiddenNA

Logical. When TRUE (default), missing codes (NA) are treated as hidden categories — they are not available for computing difference and sum groups. See Note for details on how this differs from the NAomit parameter in RowGroups().

sep_common

A character string used in the common column to separate codes that are identical across the two input columns.

sep_diff

A character string used in the diff_1_2 and diff_2_1 columns to indicate difference groups. The first column contains the parent code, and one or more child codes from the other column are subtracted.

sep_sum

A character vector of one or two elements used in the sum_1_2 and sum_2_1 columns to describe relationships where a code in one column represents the sum of several codes in the other. The first element (sep_sum[1]) acts as an equality sign, and the second element (sep_sum[2]) acts as a plus sign. If sep_sum has length 1, the same value is used for both positions.

outputNA

Character string used to represent NA values within the newly constructed text strings in the additional output columns. Only relevant when hiddenNA = FALSE.

Details

The returned list contains the same elements as from RowGroups(), but with an extended groups data frame. The columns describe relationships between the two input columns as follows:

  • is_commonTRUE when the two codes on the row are identical.

  • is_child_1, is_child_2TRUE when the code in the column is a subset or subgroup of a code in the other column.

  • common — identical code pairs, formatted using sep_common.

  • diff_1_2, diff_2_1 — difference groups. The first element is the parent from the source column, followed by one or more child codes from the opposite column, joined using sep_diff.

  • sum_1_2, sum_2_1 — sum groups where a parent code in one column equals the sum of several codes in the other.

See Also

data_diff_groups() for adding the results back as new columns in the data frame.

Examples

Run this code

df <- SSBtoolsData("code_pairs")

df

diff_groups(df)


d2 <- SSBtoolsData("d2")
diff_groups(d2[1:2])$groups
diff_groups(d2[2:3])$groups
                           

Run the code above in your browser using DataLab