This function is a wrapper around RowGroups() for the specific case where the input
contains two columns. It calls RowGroups() with returnGroups = TRUE, and extends
the resulting data frame of unique code combinations with additional information
about common groups, difference groups, and sum groups.
diff_groups(
x,
...,
hiddenNA = TRUE,
sep_common = "_=_",
sep_diff = "_-_",
sep_sum = c("_=_", "_+_"),
outputNA = "NA"
)A list (as returned by RowGroups()), where the groups data frame is
extended with additional descriptive columns indicating common, difference, and sum
relationships between the two code columns.
A data frame with exactly two columns.
Additional arguments passed to RowGroups().
Logical. When TRUE (default), missing codes (NA) are treated as
hidden categories — they are not available for computing difference and sum groups.
See Note for details on how this differs from the NAomit parameter in RowGroups().
A character string used in the common column to separate codes
that are identical across the two input columns.
A character string used in the diff_1_2 and diff_2_1 columns to
indicate difference groups. The first column contains the parent code, and one or more
child codes from the other column are subtracted.
A character vector of one or two elements used in the sum_1_2 and
sum_2_1 columns to describe relationships where a code in one column represents the
sum of several codes in the other. The first element (sep_sum[1]) acts as an equality
sign, and the second element (sep_sum[2]) acts as a plus sign. If sep_sum has
length 1, the same value is used for both positions.
Character string used to represent NA values within the newly
constructed text strings in the additional output columns.
Only relevant when hiddenNA = FALSE.
The returned list contains the same elements as from RowGroups(), but with an
extended groups data frame. The columns describe relationships between the two
input columns as follows:
is_common — TRUE when the two codes on the row are identical.
is_child_1, is_child_2 — TRUE when the code in the column is a subset or
subgroup of a code in the other column.
common — identical code pairs, formatted using sep_common.
diff_1_2, diff_2_1 — difference groups. The first element is the parent
from the source column, followed by one or more child codes from the opposite
column, joined using sep_diff.
sum_1_2, sum_2_1 — sum groups where a parent code in one column equals the sum of several codes in the other.
data_diff_groups() for adding the results back as new columns in the data frame.
df <- SSBtoolsData("code_pairs")
df
diff_groups(df)
d2 <- SSBtoolsData("d2")
diff_groups(d2[1:2])$groups
diff_groups(d2[2:3])$groups
Run the code above in your browser using DataLab