Learn R Programming

sdcMicro (version 2.0.4)

microaggregation: Microaggregation

Description

Function to perform various methods of microaggregation.

Usage

microaggregation(x, method = "pca", aggr = 3, nc = 8, clustermethod = "clara", opt = FALSE, measure = "mean", trim = 0, varsort = 1, transf = "log", blow = TRUE, blowxm = 0)

Arguments

x
data frame or matrix
method
pca, onedims, single, simple, clustpca, pppca, clustpppca, mdav, clustmcdpca, influence, mcdpca
aggr
aggregation level (default=3)
nc
number of cluster, if the chosen method performs cluster analysis
clustermethod
clustermethod, if necessary
opt
measure
aggregation statistic, mean, median, trim, onestep (default = mean)
trim
trimming percentage, if measure=trim
varsort
variable for sorting, if method= single
transf
transformation for data x
blow
if TRUE, the microaggregated data will have the same dimension as the original data set
blowxm
the microaggregated data with the same dimension as the original one.

Value

  • xoriginal data
  • methodmethod
  • clusteringTRUE, if a clustering is done before microaggregation
  • aggraggregation level
  • ncnumber of clusters, if a clustering method is chosen
  • xmaggregated data set
  • roundxmrounded aggregated data set (to integers)
  • clustermethodclustermethod, if a cluster method is chosen
  • measureproximity measure for aggregation
  • trimtrimming, if proximity measure trim is chosen
  • varsortinformation about the variable which is chosen when using method single
  • transftransformation used, when clustering is applied first
  • blowTRUE, blowxm is calculated
  • blowxmmicroaggregated data with the same dimension as the original data set
  • fotcorrection factor, necessary if totals calculated and n divided by aggr is not an integer.

Details

On http://neon.vb.cbs.nl/casc/Glossary.htm we can find the official definition of microaggregation: Records are grouped based on a proximity measure of variables of interest, and the same small groups of records are used in calculating aggregates for those variables. The aggregates are released instead of the individual record values. While for the proximity measure very different concepts can be used, microaggregation is naturally done with the mean. Nevertheless, other measures of location can be used for aggregation, especially when the group size for aggregation has been taken higher than 3. Since the median seems to be unsuitable for microaggregation due to it's rather high breakdown point, other mesures which are included can be chosen. This function contains also a method with which the data can be clustered with a variety of different clustering algorithms. Clustering observations before applying microaggregation might be useful. Note, that the data are automatically log-transformed and standardised before clustering because most of the clustering algorithms performs better on log-transformed and standardised data. The usage of clustering method Mclust requires package mclust02, which must be loaded first. The package is not loaded automatically, since the package is not under GPL but on a differnt licence. The are some projection methods for microaggregation included. The robust version pppca or clustpppca (clustering at first) are fast implementations and provide almost everytime the best results. Univariate statistics are preserved best with the individual ranking method (we called them onedims), but multivariate statistics are strong affected. With method simple one can apply microaggregation directly on the (unsorted) data and is useful for the comparison with other methods, i.e. reply the question how much better is a sorting of the data before aggregation. If blow is set to FALSE, the result will be a data set with dimension n divided by aggr.

References

http://www.springerlink.com/content/v257655u88w2/?sortorder=asc&p_o=20

See Also

summary.micro, plotMicro, valTable

Examples

Run this code
data(Tarragona)
m1 <- microaggregation(Tarragona, method="onedims", aggr=3)
## summary(m1)

Run the code above in your browser using DataLab