rankSwap: Rank Swapping

Description

Swapping values within a range so that, first, the correlation structure of original variables are preserved, and second, the values in each record are disturbed. To be used on numeric or ordinal variables where the rank can be determined and the correlation coefficient makes sense.

Usage

rankSwap(obj, variables = NULL, TopPercent = 5, BottomPercent = 5, K0 = -1, R0 = 0.95, P = 0, missing = NA, seed = NULL)

Arguments

obj

object of class sdcMicroObj or matrix or data frame

variables

names or index of variables for that rank swapping is applied. For an object of class sdcMicroObj-class, all numeric key variables are selected if variables=NULL.

TopPercent

Percentage of largest values that are grouped together before rank swapping is applied.

BottomPercent

Percentage of lowest values that are grouped together before rank swapping is applied.

Subset-mean preservation factor. Preserves the means before and after rank swapping within a range based on K0. K0 is the subset-mean preservation factor such that $abs(X_1-X_2

Multivariate preservation factor. Preserves the correlation between variables within a certain range based on the given constant R0. We can specify the preservation factor as $R_0 = R_1/R_2$ where $R_1$ is the correlation coefficient of the two fields after swapping, and $R_2$ is the correlation coefficient of the two fields before swapping.

Rank range as percentage of total sample size. We can specify the rank range itself directly, noted as $P$, which is the percentage of the records. So two records are eligible for swapping if their ranks, $i$ and $j$ respectively, satisfy $abs(i-j)

missing

missing - the value to be used as missing value in the C++ routine instead of NA. If NA, a suitable value is calculated internally. Note that in the returned dataset, all NA-values (if any) will be replaced with this value.

seed

Seed.

Value

The rank-swapped data set or a modified sdcMicroObj-class object.

Methods

list("signature(obj = \"data.frame\")")
list("signature(obj = \"matrix\")")
list("signature(obj = \"sdcMicroObj\")")

Details

Rank swapping sorts the values of one numeric variable by their numerical values (ranking). The restricted range is determined by the rank of two swapped values, which cannot differ, by definition, by more than $P$ percent of the total number of observations. R0 and K0 are only used if positive. Only one of the two are used (R0 is prefered if both are positive).

References

Moore, Jr.R. (1996) Controlled data-swapping techniques for masking public use microdata, U.S. Bureau of the Census Statistical Research Division Report Series, RR 96-04.

Examples

Run this code

data(testdata2)
data_swap <- rankSwap(testdata2,variables=c("age","income","expend","savings"))

## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- rankSwap(sdc)

Run the code above in your browser using DataLab