robustnessInCompositions: Handling robustness issues and outliers in compositions.

Description

The seamless transition to robust estimations in library(compositions).

Arguments

Details

A statistical method is called nonrobust if an arbitrary contamination of a small portion of the dataset can produce results radically different from the results without the contamination. In this sense many classical procedures relying on distributional models or on moments like mean and variance are highly nonrobust.

We consider robustness as an essential prerequierement of all statistical analysis. However in the context of compositional data analysis robustness is still in its first years.

As of Mai 2008 we provide a new approach to robustness in the package. The central idea is that robustness should be more or less automatic and that there should be no necessity to change the code to compare results obtained from robust procedures and results from there more efficient nonrobust counterparts.

To achieve this all routines that rely on distributional models (such as e.g. mean, variance, principle component analysis, scaling) and routines relying on those routines get a new standard argument of the form:

fkt(...,robust=getOption("robust"))

which defaults to a new option "robust". This option can take several values:

FALSEThe classical estimators such as arithmetic mean and persons product moment variance are used and the results are to be considered nonrobust.
TRUEThe default for robust estimation in the package is used. At this time this is covMcd in the robustbase-package. This default might change in future.
"pearson"This is a synonym for FALSE and explicitly states that no robustness should be used.
"mcd"Minimum Covariance Determinant. This option explicitly selects the use of covMcd in the robustbase-package as the main robustness engine.

More options might follow later. To control specific parameters of the model the string can get an attribute named "control" which contains additional options for the robustness engine used. In this moment the control attribute of mcd is a control object of covMcd. The control argument of "pearson" is a list containing addition options to the mean, like trim.

The standard value for getOption("robust") is FALSE to avoid situation in which the user thinks he uses a classical technique. Robustness must be switched on explicitly. Either by setting the option with options(robust=TRUE) or by giving the argument. This default might change later if the authors come to the impression that robust estimation is now considered to be the default.

For those not only interested in avoiding the influence of the outliers, but in an analysis of the outliers we added a subsystem for outlier classification. This subsystem is described in outliersInCompositions and also relies on the robust option. However evidently for these routines the factory default for the robust option is always TRUE, because it is only applicable in an outlieraware context.

We hope that in this way we can provide a seamless transition from nonrobust analysis to a robust analysis.

Examples

Run this code

# NOT RUN {
A <- matrix(c(0.1,0.2,0.3,0.1),nrow=2)
Mvar <- 0.1*ilrvar2clr(A%*%t(A))
Mcenter <- acomp(c(1,2,1))
typicalData <- rnorm.acomp(100,Mcenter,Mvar) # main population
colnames(typicalData)<-c("A","B","C")
data5 <- acomp(rbind(unclass(typicalData)+outer(rbinom(100,1,p=0.1)*runif(100),c(0.1,1,2))))

mean(data5)
mean(data5,robust=TRUE)
var(data5)
var(data5,robust=TRUE)
Mvar
biplot(princomp(data5))
biplot(princomp(data5,robust=TRUE))

# }

Run the code above in your browser using DataLab

Description

Arguments

Details

See Also

Examples