After running EAdet an imputation of the detected outliers with
EAimp may be run.
EAimp(data, weights, outind, reach = "max", transmission.function = "root",
power = ncol(data), distance.type = "euclidean", duration = 5,
maxl = 5, kdon = 1, monitor = FALSE, threshold = FALSE,
deterministic = TRUE, fixedprop = 0)EAimp returns a list with two components: parameters and
imputed.data.
parameters contains the following elements:
sample.sizeNumber of observations
number.of.variablesNumber of variables
n.complete.recordsNumber of records without missing values
n.usable.recordsNumber of records with less than half of values missing (unusable observations are discarded)
durationDuration of epidemic
reachTransmission distance (d0)
thresholdInput parameter
deterministicInput parameter
computation.timeElapsed computation time
imputed.data contains the imputed data.
a data frame or matrix with the data.
a vector of positive sampling weights.
a logical vector with component TRUE for outliers.
reach of the threshold function (usually set to the maximum
distance to a nearest neighbour, see internal function EA.dist).
form of the transmission function of distance d:
"step" is a heaviside function which jumps to 1 at d0,
"linear" is linear between 0 and d0, "power" is
beta*d+1^(-p) for p=ncol(data) as default, "root" is the
function 1-(1-d/d0)^(1/maxl).
sets p=power, where p is the parameter in the above
transmission function.
distance type in function dist().
the duration of the detection epidemic.
maximum number of steps without infection.
the number of donors that should be infected before imputation.
if TRUE verbose output on epidemic.
Infect all remaining points with infection probability above
the threshold 1-0.5^(1/maxl).
if TRUE the number of infections is the expected
number and the infected observations are the ones with largest infection
probabilities.
if TRUE a fixed proportion of observations is infected
at each step.
Beat Hulliger
EAimp uses the distances calculated in EAdet (actually the
counterprobabilities, which are stored in a global data set) and starts an
epidemic at each observation to be imputed until donors for the missing values
are infected. Then a donor is selected randomly.
Béguin, C. and Hulliger, B. (2004) Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations, JRSS-A, 167, Part 2, pp. 275-294.
EAdet for outlier detection with the Epidemic Algorithm.
data(bushfirem, bushfire.weights)
det.res <- EAdet(bushfirem, bushfire.weights)
imp.res <- EAimp(bushfirem, bushfire.weights, outind = det.res$outind,
reach = det.res$output$max.min.di, kdon = 3)
print(imp.res$output)
Run the code above in your browser using DataLab