Learn R Programming

sdcMicro (version 5.5.1)

suda2: Suda2: Detecting Special Uniques

Description

SUDA risk measure for data from (stratified) simple random sampling.

Usage

suda2(obj, ...)

Arguments

obj

object of class data.frame or a sdcMicroObj-class-object

...

see arguments below

  • variables: Categorical (key) variables. Either the column names or and index of the variables to be used for risk measurement.

  • missing: Missing value coding in the given data set.

  • DisFraction: It is the sampling fraction for the simple random sampling, and the common sampling fraction for stratified sampling. By default, it's set to 0.01.

  • original_scores: if this argument is TRUE (the default), the suda-scores are computed as described in paper "SUDA: A Program for Detecting Special Uniques" by Elliot et al., if FALSE, the computation of the scores is slightly different as it was done in the original implementation of the algorithm by the IHSN.

Value

A modified sdcMicroObj-class object or the following list

  • ContributionPercent: The contribution of each key variable to the SUDA score, calculated for each row.

  • score: The suda score.

  • disscore: The dis suda score

  • attribute_contributions: data.frame showing how much of the total risk is contributed by each variable. This information is stored in a data.frame in two variables:

    • variable: containing the name of the variable

    • contribution: contains how much risk a variable contributes to the total risk.

  • attribute_level_contributions: shows risks of each attribute-level. this is saved in a data.frame with three columns.

    • variable: containing the name of the variable

    • attribute: holding relevant level-codes and

    • contribution: contains the risk of this level within the variable.)

Details

Suda 2 is a recursive algorithm for finding Minimal Sample Uniques. The algorithm generates all possible variable subsets of defined categorical key variables and scans them for unique patterns in the subsets of variables. The lower the amount of variables needed to receive uniqueness, the higher the risk of the corresponding observation.

References

C. J. Skinner; M. J. Elliot (20xx) A Measure of Disclosure Risk for Microdata. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 64 (4), pp 855--867.

M. J. Elliot, A. Manning, K. Mayes, J. Gurd and M. Bane (20xx) SUDA: A Program for Detecting Special Uniques, Using DIS to Modify the Classification of Special Uniques

Anna M. Manning, David J. Haglin, John A. Keane (2008) A recursive search algorithm for statistical disclosure assessment. Data Min Knowl Disc 16:165 -- 196

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. 10.1007/978-3-319-50272-4

Examples

Run this code
# NOT RUN {
data(testdata2)
data_suda2 <- suda2(testdata2,variables=c("urbrur","roof","walls","water","sex"))
data_suda2
str(data_suda2)
summary(data_suda2)

## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- suda2(sdc, original_scores=FALSE)
# }

Run the code above in your browser using DataLab