sdcMicro (version 4.1.0)

microaggrGower: Microaggregation for numerical and categorical key variables based on a distance similar to the GOWER DISTANCE

Description

The microaggregation is based on the distances computed similar to the Gower distance. The distance function makes distinction between the variable types factor,ordered,numerical and mixed (semi-continuous variables with a fixed probability mass at a constant value e.g. 0)

Usage

microaggrGower(obj, variables = NULL, aggr = 3, dist_var = NULL,
  by = NULL, mixed = NULL, mixed.constant = NULL, trace = FALSE,
  weights = NULL, numFun = mean, catFun = sampleCat, addRandom = FALSE)
sampleCat(x)
maxCat(x)

Arguments

obj
an object of class sdcMicroObj or a data frame
variables
character vector with names of variables to be aggregated (Default for sdcMicroObj is all keyVariables and all numeric key variables)
aggr
aggregation level (default=3)
dist_var
character vector with variable names for distance computation
by
character vector with variable names to split the dataset before performing microaggregation (Default for sdcMicroObj is strataVar)
mixed
character vector with names of mixed variables
mixed.constant
numeric vector with length equal to mixed, where the mixed variables have the probability mass
trace
TRUE/FALSE for some console output
weights
numerical vector with length equal the number of variables for distance computation
numFun
function: to be used to aggregated numerical variables
catFun
function: to be used to aggregated categorical variables
addRandom
TRUE/FALS if a random value should be added for the distance computation.
x
a factor vector

Value

  • The function returns the updated sdcMicroObj or simply an altered data frame.

Details

The function sampleCat samples with probabilities corresponding to the occurrence of the level in the NNs. The function maxCat chooses the level with the most occurrences and random if the maximum is not unique.

Examples

Run this code
data(testdata,package="sdcMicro")
testdata <- testdata[1:200,]
for(i in c(1:7,9)) testdata[,i] <- as.factor(testdata[,i])
test <- microaggrGower(testdata,variables=c("relat","age","expend"),
  dist_var=c("age","sex","income","savings"),by=c("urbrur","roof"))

sdc <- createSdcObj(testdata,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'), 
  numVars=c('expend','income','savings'), w='sampling_weight')

sdc <- microaggrGower(sdc)

Run the code above in your browser using DataLab