suda2: Suda2: Detecting Special Uniques

Description

SUDA risk measure for data from (stratified) simple random sampling.

Usage

suda2(obj,...)#,variables=NULL,missing=-999,DisFraction=0.01)

Arguments

obj

object of class data.frame or object of class sdcMicroObj

...

see arguments below

variables

Categorical (key) variables. Either the column names or and index of the variables to be used for risk measurement.

missing

Missing value coding in the given data set.

DisFraction

It is the sampling fraction for the simple random sampling, and the common sampling fraction for stratified sampling. By default, it's set to 0.01.

Value

A modified sdcMicroObj object or the following list
ContributionPercentThe contribution of each key variable to the SUDA score, calculated for each row.
scoreThe suda score.
disscoreThe dis suda score

Details

Suda 2 is a recursive algorithm for finding Minimal Sample Uniques. The algorithm generates all possible variable subsets of defined categorical key variables and scans them for unique patterns in the subsets of variables. The lower the amount of variables needed to receive uniqueness, the higher the risk of the corresponding observation.

References

C. J. Skinner; M. J. Elliot (20xx) A Measure of Disclosure Risk for Microdata. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 64 (4), pp 855--867. M. J. Elliot, A. Manning, K. Mayes, J. Gurd and M. Bane (20xx) SUDA: A Program for Detecting Special Uniques, Using DIS to Modify the Classification of Special Uniques Anna M. Manning, David J. Haglin, John A. Keane (2008) A recursive search algorithm for statistical disclosure assessment. Data Min Knowl Disc 16:165 -- 196

Examples

Run this code

data(testdata2)
data_suda2 <- suda2(testdata2,variables=c("urbrur","roof","walls","water","sex"))
data_suda2
summary(data_suda2)

## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'), 
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- suda2(sdc)

Run the code above in your browser using DataLab