A statistical method is called nonrobust if an arbitrary contamination of
a small portion of the dataset can produce results radically different
from the results without the contamination. In this sense many
classical procedures relying on distributional models or on moments
like mean and variance are highly nonrobust.
We consider robustness as an essential prerequierement of all
statistical analysis. However in the context of compositional data
analysis robustness is still in its first years.
As of Mai 2008 we provide a new approach to robustness in the
package. The central idea is that robustness should be more or less
automatic and that there should be no necessity to change the code to
compare results obtained from robust procedures and results from there
more efficient nonrobust counterparts.
To achieve this all routines that rely on distributional models (such
as e.g. mean,
variance, principle component analysis, scaling) and routines relying
on those routines get a new standard argument of the form:
fkt(...,robust=getOption("robust"))
which defaults to a new option "robust". This option can take several
values:
FALSEThe classical estimators such as arithmetic mean and
persons product moment variance are used and the results are to be
considered nonrobust.
TRUEThe default for robust estimation in the package is
used. At this time this is covMcd
in the
robustbase-package. This default might change in future.
"pearson"This is a synonym for FALSE and explicitly states
that no robustness should be used.
"mcd"Minimum Covariance Determinant. This option explicitly
selects the use of covMcd
in the
robustbase-package as the main robustness engine.
More options might follow later.
To control specific parameters of the
model the string can get an attribute named "control" which contains
additional options for the robustness engine used. In this moment the
control attribute of mcd is a control object of
covMcd
. The control argument of "pearson" is a list
containing addition options to the mean, like trim.
The standard value for getOption("robust") is FALSE to avoid situation
in which the user thinks he uses a classical technique. Robustness
must be switched on explicitly. Either by setting the option with
options(robust=TRUE)
or by giving the argument. This default
might change later if the authors come to the impression that robust
estimation is now considered to be the default.
For those not only interested in avoiding the influence of the
outliers, but in an analysis of the outliers we added a subsystem for
outlier classification. This subsystem is described in
outliersInCompositions and also relies on the
robust option. However evidently for these routines the factory
default for the robust option is always TRUE, because it is only
applicable in an outlieraware context.
We hope that in this way we can provide a seamless transition from
nonrobust analysis to a robust analysis.