Aggregates are generated followed by
primary suppression followed by
secondary suppression by Gaussian elimination by GaussSuppression
GaussSuppressionFromData(
data,
dimVar = NULL,
freqVar = NULL,
...,
numVar = NULL,
weightVar = NULL,
charVar = NULL,
hierarchies = NULL,
formula = NULL,
maxN = suppressWarnings(formals(c(primary)[[1]])$maxN),
protectZeros = suppressWarnings(formals(c(primary)[[1]])$protectZeros),
secondaryZeros = suppressWarnings(formals(candidates)$secondaryZeros),
candidates = CandidatesDefault,
primary = PrimaryDefault,
forced = NULL,
hidden = NULL,
singleton = SingletonDefault,
singletonMethod = ifelse(secondaryZeros, "anySumNOTprimary", "anySum"),
printInc = TRUE,
output = "publish",
x = NULL,
crossTable = NULL,
preAggregate = is.null(freqVar),
extraAggregate = preAggregate & !is.null(charVar),
structuralEmpty = FALSE,
extend0 = FALSE,
spec = NULL,
specLock = FALSE,
freqVarNew = rev(make.unique(c(names(data), "freq")))[1],
nUniqueVar = rev(make.unique(c(names(data), "nUnique")))[1],
forcedInOutput = "ifNonNULL",
unsafeInOutput = "ifForcedInOutput",
lpPackage = NULL,
intervalSuppression = TRUE,
aggregatePackage = "base",
aggregateNA = TRUE,
aggregateBaseOrder = FALSE,
rowGroupsPackage = aggregatePackage,
linkedGauss = NULL,
linkedIntervals = ifelse(linkedGauss == "local-bdiag", "local-bdiag",
"super-consistent"),
recordAware = TRUE,
collapseAware = FALSE,
linkedTables = NULL,
action_unused_dots = getOption("GaussSuppression.action_unused_dots", "inform"),
allowed_unused_dots = getOption("GaussSuppression.allowed_unused_dots", character(0))
)Aggregated data with suppression information
Input data, typically a data frame, tibble, or data.table.
If data is not a classic data frame, it will be coerced to one internally
unless preAggregate is TRUE and aggregatePackage is "data.table".
The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.
A single variable holding counts (name or number).
Further arguments to be passed to the supplied functions and to ModelMatrix (such as inputInOutput and removeEmpty).
Other numerical variables to be aggregated
weightVar Weights (costs) to be used to order candidates for secondary suppression
Other variables possibly to be used within the supplied functions
List of hierarchies, which can be converted by AutoHierarchies.
Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.
A model formula
Suppression parameter forwarded to the supplied functions.
With the default primary function, PrimaryDefault(), cells with frequency <= maxN are marked as primary suppressed,
and the default value of maxN is 3. See details below.
The parameter is also used by NContributorsRule().
For advanced use cases, including setups with multiple primary functions, maxN can be specified
as a named list or vector. See each primary function’s documentation for details.
Suppression parameter.
When TRUE, cells with zero frequency or value are set as primary suppressed.
Using the default primary function, protectZeros is by default set to TRUE. See details.
Suppression parameter.
When TRUE, cells with zero frequency or value are prioritized to be published so that they are not secondary suppressed.
Using the default candidates function, secondaryZeros is by default set to FALSE.
See details.
GaussSuppression input or a function generating it (see details) Default: CandidatesDefault
GaussSuppression input or a function generating it (see details) Default: PrimaryDefault
GaussSuppression input or a function generating it (see details)
GaussSuppression input or a function generating it (see details)
GaussSuppression input or a function generating it (see details) Default: SingletonDefault
GaussSuppression input. The default value depends on parameter secondaryZeros which depends on candidates (see details).
GaussSuppression input
One of "publish" (default), "inner", "publish_inner", "publish_inner_x", "publish_x",
"inner_x", "input2functions" (input to supplied functions),
"inputGaussSuppression", "inputGaussSuppression_x",
"outputGaussSuppression" "outputGaussSuppression_x",
"primary", "secondary" and "all".
Here "inner" means input data (possibly pre-aggregated) and
"x" means dummy matrix (as input parameter x).
All input to and output from GaussSuppression, except ..., are returned when "outputGaussSuppression_x".
Excluding x and only input are also possible.
The code "all" means all relevant output after all the calculations.
Currently, this means the same as "publish_inner_x" extended with the matrices (or NULL) xExtraPrimary and unsafe.
The former matrix is usually made by KDisclosurePrimary.
This latter matrix contains the columns representing unsafe primary suppressions.
In addition to x columns corresponding to unsafe in ordinary output (see parameter unsafeInOutput below),
possible columns from xExtraPrimary may also be included in the unsafe matrix (see details).
x (modelMatrix) and crossTable can be supplied as input instead of generating it from ModelMatrix
See above.
When TRUE, the data will be aggregated within the function to an appropriate level.
This is defined by the dimensional variables according to dimVar, hierarchies or formula and in addition charVar.
When FALSE, no aggregation is performed.
When NA, the function will automatically decide whether to aggregate:
aggregation is applied unless freqVar is present and the data contain no duplicated rows with respect to
the dimensional variables and charVar.
Exception: if a non-NULL x (the model matrix) is supplied, NA is treated as FALSE.
When TRUE, the data will be aggregated by the dimensional variables according to dimVar, hierarchies or formula.
The aggregated data and the corresponding x-matrix will only be used as input to the singleton
function and GaussSuppression.
This extra aggregation is useful when parameter charVar is used.
Supply "publish_inner", "publish_inner_x", "publish_x" or "inner_x" as output to obtain extra aggregated results.
Supply "inner" or "input2functions" to obtain other results.
When TRUE, output cells with no contributing inner cells (only zeros in column of x)
are forced to be not primary suppressed.
Thus, these cells are considered as structural zeros.
When structuralEmpty is TRUE, the following error message is avoided:
Suppressed cells with empty input will not be protected.
Extend input data with zeros?.
When removeEmpty is TRUE (see "..." below), structuralEmpty is superfluous
Data is automatically extended by Extend0 when TRUE.
Can also be set to "all" which means that input codes in hierarchies are considered in addition to those in data.
Parameter extend0 can also be specified as a list meaning parameter varGroups to Extend0.
NULL or a named list of arguments that will act as default values.
When TRUE, arguments in spec cannot be changed.
Name of new frequency variable generated when input freqVar is NULL and preAggregate is TRUE.
Default is "freq" provided this is not found in names(data).
Name of variable holding the number of unique contributors.
This variable will be generated in the extraAggregate step.
Default is "nUnique" provided this is not found in names(data).
If an existing variable is passed as input,
this variable will apply only when preAggregate/extraAggregate is not done.
Whether to include forced as an output column.
One of "ifNonNULL" (default), "always", "ifany" and "no".
In addition, TRUE and FALSE are allowed as alternatives to "always" and "no".
Whether to include usafe as an output column.
One of "ifForcedInOutput" (default), "always", "ifany" and "no".
In addition, TRUE and FALSE are allowed as alternatives to "always" and "no".
see details.
When non-NULL, intervals computed by ComputeIntervals() will
be included in the output. Valid values are the names of supported R
packages for linear programming backends: "highs", "Rsymphony",
"Rglpk", or "lpSolve".
If interval requirements are specified, additional suppression will be
performed to satisfy those requirements. Interval requirements can be
set either through arguments of IntervalLimits() or by enabling
protectionIntervals = TRUE in the primary suppression functions.
See IntervalLimits() for a full description of the parameters
(protectionPercent, protectionLimit, loProtectionPercent,
loProtectionLimit, rangePercent, rangeMin) and how interval
requirements are calculated.
In the output variable suppressed_integer, suppression status is
coded as:
0 = no suppression,
1 = primary suppression,
2 = secondary suppression,
3 = additional suppression applied by an interval algorithm limited
to linearly independent cells,
4 = further suppression according to the final gauss algorithm.
Intervals [lo_1, up_1] are calculated prior to additional suppression.
To disable additional suppression, set intervalSuppression = FALSE.
Please note that additional suppression based on parameters other than
rangePercent and rangeMin is currently considered experimental.
In particular, the names of the newer parameters may still change.
Logical. If FALSE, additional suppression to satisfy interval
requirements is disabled (default is TRUE). See description of
lpPackage above.
Package used to preAggregate/extraAggregate.
Parameter pkg to aggregate_by_pkg.
Whether to include NAs in the grouping variables while preAggregate/extraAggregate.
Parameter include_na to aggregate_by_pkg.
Note that NAs will not be present in the output table's dimensions regardless of the value of aggregateNA.
When using the formula interface, this is controlled by the NAomit parameter (default TRUE),
which is passed to the function SSBtools::Formula2ModelMatrix().
It is through this use of the formula interface that NAs in the input data make sense.
Note that under normal circumstances, grouping variables should not use NA to represent a category.
As such, if NAs are present in the grouping variables, using the dimVar or hierarchies interfaces
will result in errors.
Parameter base_order to aggregate_by_pkg,
used when preAggregate/extraAggregate.
The parameter does not affect the ordering of ordinary output.
Therefore, the default is set to FALSE to avoid unnecessary sorting operations.
The parameter will have impact when, e.g output = "inner".
Parameter pkg to RowGroups.
The parameter is input to Formula2ModelMatrix
via ModelMatrix.
Controls linked table suppression. Accepted values are described in the
documentation for SuppressLinkedTables().
See also the note and the corresponding examples, which
demonstrate usage with alternative function interfaces.
In addition, linkedGauss = "global" is allowed and corresponds to standard execution
(i.e., when linkedGauss is not specified).
When linkedGauss is used, the formula parameter should be provided as a list of formulas.
Alternatively, formula may have an attribute "table_formulas" containing such a list.
See also the linkedTables parameter below.
Determines how interval calculations,
triggered by the lpPackage parameter, are performed when linkedGauss is not "global".
When linkedGauss = "global", interval settings in linkedIntervals are ignored.
For allowed values and detailed behaviour, see the documentation of SuppressLinkedTables().
Note: With linkedIntervals = "local-bdiag", common cells may have different table-specific intervals.
Since the output shows one interval per cell, it is constructed using the maximum lower bound and
minimum upper bound across the tables.
Parameter associated with linkedGauss. See SuppressLinkedTables().
Parameter associated with linkedGauss.
In the linked‑tables algorithm, the model matrix is first collapsed by
removing duplicate rows.
When collapseAware = TRUE, every cell that remains numerically derivable
after a pre‑aggregation corresponding to this row reduction will be treated
as a common cell. This
maximizes coordination across tables, given the duplicate‑row removal,
while adding limited additional computational overhead. In particular,
the suppression algorithm automatically accounts for cells in one table
that are sums of cells in another table.
Note that any cell that recordAware = TRUE would introduce is already
included automatically when collapseAware = TRUE.
A list specifying how the tables referenced in the formula
parameter should be combined for use in the linked-tables algorithm.
Each element in the list contains one or more names of the tables in formula.
The corresponding tables will be combined and treated as a single table by the algorithm.
For example: linkedTables = list(c("table_1", "table_3"), "table_2").
If NULL (default), each table in formula is used individually.
Character string controlling how unused arguments
in ... are handled. Internally uses ellipsis::check_dots_used() with a
custom action. One of "warn", "abort", "inform", or "none". The value "none"
disables the check entirely. The default is taken from
getOption("GaussSuppression.action_unused_dots"), falling back to "inform"
if the option is not set. Users can change the default globally with e.g.
options(GaussSuppression.action_unused_dots = "abort").
Character vector of argument names ignored by the
unused-argument check. May be useful when this function is wrapped by
another function, or in other cases where a correctly spelled argument is
nevertheless not registered as used. The default is taken from
getOption("GaussSuppression.allowed_unused_dots"), falling back to
character(0) if the option is not set. Users can change the default
globally with e.g.
options(GaussSuppression.allowed_unused_dots = c("plotColor", "lineType")).
Øyvind Langsrud and Daniel Lupp
The supplied functions for generating GaussSuppression input takes the following arguments:
crossTable, x, freq, num, weight, maxN, protectZeros, secondaryZeros, data, freqVar, numVar, weightVar, charVar, dimVar
aggregatePackage, aggregateNA, aggregateBaseOrder, rowGroupsPackage, structuralEmpty, and ....
where the two first are ModelMatrix outputs (modelMatrix renamed to x).
The vector, freq, is aggregated counts (t(x) %*% data[[freqVar]]).
In addition, the supplied singleton function also takes nUniqueVar and (output from) primary as input.
Similarly, num, is a data frame of aggregated numerical variables.
It is possible to supply several primary functions joined by c, e.g. (c(FunPrim1, FunPrim2)).
All NAs returned from any of the functions force the corresponding cells not to be primary suppressed.
The effect of maxN , protectZeros and secondaryZeros depends on the supplied functions where these parameters are used.
Their default values are inherited from the default values of the first primary function (several possible) or,
in the case of secondaryZeros, the candidates function.
When defaults cannot be inherited, they are set to NULL.
In practice the function formals are still used to generate the defaults when primary and/or candidates are not functions.
Then NULL is correctly returned, but suppressWarnings are needed.
Singleton handling can be turned off by singleton = NULL or singletonMethod = "none".
Both of these choices are identical in the sense that singletonMethod is set to "none" whenever singleton is NULL and vice versa.
Information about uncertain primary suppressions due to forced cells can be found
as described by parameters unsafeInOutput and output (= "all").
When forced cells affect singleton problems, this is not implemented.
Some information can be seen from warnings.
This can also be seen by choosing output = "secondary" together
with unsafeInOutput = "ifany" or unsafeInOutput = "always".
Then, negative indices from GaussSuppression using
unsafeAsNegative = TRUE will be included in the output.
Singleton problems may, however, be present even if it cannot be seen as warning/output.
In some cases, the problems can be detected by GaussSuppressDec.
In some cases, cells that are forced, hidden, or primary suppressed can overlap.
For these situations, forced has precedence over hidden and primary.
That is, if a cell is both forced and hidden, it will be treated as a forced cell and thus published.
Similarly, any primary suppression of a forced cell will be ignored
(see parameter whenPrimaryForced to GaussSuppression).
It is, however, meaningful to combine primary and hidden.
Such cells will be protected while also being assigned the NA value in the suppressed output variable.
z1 <- SSBtoolsData("z1")
GaussSuppressionFromData(z1, 1:2, 3)
z2 <- SSBtoolsData("z2")
GaussSuppressionFromData(z2, 1:4, 5, protectZeros = FALSE)
# Data as in GaussSuppression examples
df <- data.frame(values = c(1, 1, 1, 5, 5, 9, 9, 9, 9, 9, 0, 0, 0, 7, 7),
var1 = rep(1:3, each = 5), var2 = c("A", "B", "C", "D", "E"))
GaussSuppressionFromData(df, c("var1", "var2"), "values")
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 + var2, maxN = 10)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 + var2, maxN = 10,
protectZeros = TRUE, # Parameter needed by SingletonDefault and default not in primary
primary = function(freq, crossTable, maxN, ...)
which(freq <= maxN & crossTable[[2]] != "A" & crossTable[, 2] != "C"))
# Combining several primary functions
# Note that NA & c(TRUE, FALSE) equals c(NA, FALSE)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 + var2, maxN = 10,
primary = c(function(freq, maxN, protectZeros = TRUE, ...) freq >= 45,
function(freq, maxN, ...) freq <= maxN,
function(crossTable, ...) NA & crossTable[[2]] == "C",
function(crossTable, ...) NA & crossTable[[1]]== "Total"
& crossTable[[2]]== "Total"))
# Similar to GaussSuppression examples
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 * var2,
candidates = NULL, singleton = NULL, protectZeros = FALSE, secondaryZeros = TRUE)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 * var2,
singleton = NULL, protectZeros = FALSE, secondaryZeros = FALSE)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 * var2,
protectZeros = FALSE, secondaryZeros = FALSE)
# Examples with zeros as singletons
z <- data.frame(row = rep(1:3, each = 3), col = 1:3, freq = c(0, 2, 5, 0, 0, 6:9))
GaussSuppressionFromData(z, 1:2, 3, singleton = NULL)
GaussSuppressionFromData(z, 1:2, 3, singletonMethod = "none") # as above
GaussSuppressionFromData(z, 1:2, 3)
GaussSuppressionFromData(z, 1:2, 3, protectZeros = FALSE, secondaryZeros = TRUE, singleton = NULL)
GaussSuppressionFromData(z, 1:2, 3, protectZeros = FALSE, secondaryZeros = TRUE)
Run the code above in your browser using DataLab