Learn R Programming

GaussSuppression (version 1.1.5)

SuppressLinkedTables: Consistent Suppression of Linked Tables

Description

Provides alternatives to global protection for linked tables through methods that may reduce the computational burden.

Usage

SuppressLinkedTables(
  data = NULL,
  fun,
  ...,
  withinArg = NULL,
  linkedGauss = "super-consistent",
  linkedIntervals = ifelse(linkedGauss == "local-bdiag", "local-bdiag",
    "super-consistent"),
  lpPackage = NULL,
  recordAware = TRUE,
  iterBackTracking = Inf,
  whenEmptyUnsuppressed = NULL
)

Value

A list of data frames, or, if withinArg is NULL, the ordinary output from fun.

Arguments

data

The data argument to fun. When NULL data must be included in withinArg.

fun

A function: GaussSuppressionFromData or one of its wrappers such as SuppressSmallCounts and SuppressDominantCells.

...

Arguments to fun that are kept constant.

withinArg

A list of named lists. Arguments to fun that are not kept constant. If withinArg is named, the names will be used as names in the output list.

linkedGauss

Specifies the strategy for protecting linked tables. The "super-consistent", "consistent", and "local-bdiag" methods protect all linked tables together in a single call to GaussSuppression() using an internally constructed block-diagonal model matrix.

  • "super-consistent" (default): Shares the key property of "consistent" that common cells are suppressed equally across tables, but also exploits the fact that these cells have identical values in all tables. The coordination is therefore stronger. If intervals are calculated using such coordination, common cells will have identical interval bounds in each table.

  • "consistent": Common cells are suppressed equally across tables.

  • "local": Each table is protected independently by a separate call to GaussSuppression().

  • "back-tracking": Iterative approach where each table is protected via GaussSuppression(), and primary suppressions are adjusted based on secondary suppressions from other tables across iterations.

  • "local-bdiag": Produces the same result as "local", but uses a single call to GaussSuppression(). It does not apply the linked-table methodology.

linkedIntervals

This parameter controls how interval calculations, triggered by the lpPackage parameter, are performed.

  • Default: "local-bdiag" if linkedGauss is set to "local-bdiag", and "super-consistent" in all other cases.

  • Possible values of linkedIntervals are "local-bdiag" and "super-consistent".

  • Interval calculations can be performed when linkedGauss is "super-consistent", "consistent", or "local-bdiag".

  • When linkedGauss is "local-bdiag", "local-bdiag" is the only allowed value in linkedIntervals (except that, with the alternative approaches, "global" may appear as a later element; "super-consistent" is never allowed).

  • It is possible to request multiple types of intervals by supplying linkedIntervals as a vector. Only the first value affects the additional suppression defined by rangePercent and/or rangeMin.

  • With the alternative approaches (see the note below), "global" may also appear in linkedIntervals, provided it is not the first element.

lpPackage

See GaussSuppressionFromData().

recordAware

If TRUE (default), the suppression procedure will ensure consistency across cells that aggregate the same underlying records, even when their variable combinations differ. When TRUE, data cannot be included in withinArg.

iterBackTracking

Maximum number of back-tracking iterations.

whenEmptyUnsuppressed

Parameter to GaussSuppression. This is about a helpful message "Cells with empty input will never be secondary suppressed. Extend input data with zeros?" Here, the default is set to NULL (no message), since preprocessing of the model matrix may invalidate the assumptions behind this message.

Details

The reason for introducing the new method "consistent", which has not yet been extensively tested in practice, is to provide something that works better than "back-tracking", while still offering equally strong protection.

Note that for singleton methods of the elimination type (see SSBtools::NumSingleton()), "back-tracking" may lead to the creation of a large number of redundant secondary cells. This is because, during the method's iterations, all secondary cells are eventually treated as primary. As a result, protection is applied to prevent a singleton contributor from inferring a secondary cell that was only included to protect that same contributor.

Note that the frequency singleton methods "subSpace", "anySum0", and "anySumNOTprimary" are currently not implemented and will result in an error. As a result, the singletonZeros parameter in the SuppressDominantCells() function cannot be set to TRUE, and the SuppressKDisclosure() function is not available for use. Also note that automatic forcing of "anySumNOTprimary" is disabled. That is, SSBtools::GaussSuppression() is called with auto_anySumNOTprimary = FALSE. See the parameter documentation for an explanation of why FALSE is required.

Examples

Run this code

### The first example can be performed in three ways
### Alternatives are possible since only the formula parameter varies between the linked tables
 
a <- SuppressLinkedTables(data = SSBtoolsData("magnitude1"), # With trick "sector4 - sector4" and 
                 fun = SuppressDominantCells,        # "geo - geo" to ensure same names in output
                 withinArg = list(list(formula = ~(geo + eu) * sector2 + sector4 - sector4), 
                                  list(formula = ~eu:sector4 - 1 + geo - geo), 
                                  list(formula = ~geo + eu + sector4 - 1)), 
                 dominanceVar  = "value", 
                 pPercent = 10, 
                 contributorVar = "company",
                 linkedGauss = "consistent")
print(a)  

# Alternatively, SuppressDominantCells() can be run directly using the linkedGauss parameter  
a1 <- SuppressDominantCells(SSBtoolsData("magnitude1"), 
               formula = list(table_1 = ~(geo + eu) * sector2, 
                              table_2 = ~eu:sector4 - 1,
                              table_3 = ~(geo + eu) + sector4 - 1), 
               dominanceVar = "value", 
               pPercent = 10, 
               contributorVar = "company", 
               linkedGauss = "consistent")
print(a1)

# In fact, tables_by_formulas() is also a possibility
a2 <- tables_by_formulas(SSBtoolsData("magnitude1"),
               table_fun = SuppressDominantCells, 
               table_formulas = list(table_1 = ~region * sector2, 
                                    table_2 = ~region1:sector4 - 1, 
                                    table_3 = ~region + sector4 - 1), 
               substitute_vars = list(region = c("geo", "eu"), region1 = "eu"), 
               collapse_vars = list(sector = c("sector2", "sector4")), 
               dominanceVar  = "value", 
               pPercent = 10, 
               contributorVar = "company",
               linkedGauss = "consistent") 
print(a2)                 
               
               
               
               
####  The second example cannot be handled using the alternative methods.
####  This is similar to the (old) LazyLinkedTables() example.

z1 <- SSBtoolsData("z1")
z2 <- SSBtoolsData("z2")
z2b <- z2[3:5]  # As in ChainedSuppression example 
names(z2b)[1] <- "region" 
# As 'f' and 'e' in ChainedSuppression example. 
# 'A' 'annet'/'arbeid' suppressed in b[[1]], since suppressed in b[[3]].
b <- SuppressLinkedTables(fun = SuppressSmallCounts,
              linkedGauss = "consistent",  
              recordAware = FALSE,
              withinArg = list(
                list(data = z1, dimVar = 1:2, freqVar = 3, maxN = 5), 
                list(data = z2b, dimVar = 1:2, freqVar = 3, maxN = 5), 
                list(data = z2, dimVar = 1:4, freqVar = 5, maxN = 1)))
print(b)        
       
    
    
##################################       
####  Examples with intervals     
################################## 
  
lpPackage <- "highs" 
if (requireNamespace(lpPackage, quietly = TRUE)) {

  # Common cells occur because the default for recordAware is TRUE
  out1 <- SuppressLinkedTables(data = SSBtoolsData("magnitude1"), 
               fun = SuppressDominantCells, 
               withinArg = list(table_1 = list(dimVar = c("geo", "sector2")), 
                                table_2 = list(dimVar = c("eu", "sector4"))), 
               dominanceVar = "value", k = 90, contributorVar = "company", 
               lpPackage = lpPackage, rangeMin = 50)
  print(out1)
                      
 
 # In the algorithm, common cells occur because recordAware is TRUE, 
 # although this is not reflected in the output variables table_1 and table_2
 out2 <- tables_by_formulas(data = SSBtoolsData("magnitude1"), 
               table_fun = SuppressDominantCells, 
               table_formulas = list(table_1 = ~geo * sector2, 
                                     table_2 = ~eu * sector4), 
               substitute_vars = list(region = c("geo", "eu"), 
                                      sector = c("sector2", "sector4")), 
               dominanceVar = "value", k = 90, contributorVar = "company", 
               linkedGauss = "super-consistent", 
               lpPackage = lpPackage, rangeMin = 50, 
               linkedIntervals = c("super-consistent", "local-bdiag", "global"))
 print(out2)
                       
} else {
  message(paste0("Examples skipped because the '", lpPackage, "' package is not installed."))
}  

Run the code above in your browser using DataLab