RSimca(x, ...)
"RSimca"(x, grouping, prior=proportions, k, kmax = ncol(x), control="hubert", alpha, tol = 1.0e-4, trace=FALSE, ...)
"RSimca"(formula, data = NULL, ..., subset, na.action)
y~x
, it describes the response
and the predictors. The formula can be more complicated, such as
y~log(x)+z
etc (see formula
for more details).
The response should
be a factor representing the response variable, or any vector
that can be coerced to such (such as a logical variable).model.frame
) containing the variables in the
formula formula
.x
.k
is missing,
or k = 0
, the algorithm itself will determine the number of
components by finding such k
that $l_k/l_1 >= 10.E-3$ and
$\Sigma_{j=1}^k l_j/\Sigma_{j=1}^r l_j >= 0.8$.
It is preferable to investigate the scree plot in order to choose the number
of components and then run again. Default is k=0
. kmax=10
. If k
is provided, kmax
does not need to be specified, unless k
is larger than 10.trace = FALSE
RSimca-class
which is a subclass of of the
virtual class Simca-class
.
RSimca
, serving as a constructor for objects of class RSimca-class
is a generic function with "formula" and "default" methods.SIMCA is a two phase procedure consisting of PCA performed on each group
separately for dimension reduction followed by classification rules built
in the lower dimensional space (note that the dimension in
each group can be different). Instead of classical PCA robust alternatives will be used.
Any of the robust PCA methods available in package Pca-class
can be used through the argument control
.
In original SIMCA new observations are
classified by means of their deviations from the different PCA models.
Here the classification rules will be obtained using two popular distances arising from PCA -
orthogonal distances (OD) and score distances (SD). For the definition of these distances,
the definition of the cutoff values and the standartization of the distances see
Vanden Branden K, Hubert M (2005) and Todorov and Filzmoser (2009).
data(pottery)
dim(pottery) # 27 observations in 2 classes, 6 variables
head(pottery)
## Build the SIMCA model. Use RSimca for a robust version
rs <- RSimca(origin~., data=pottery)
rs
summary(rs)
## generate a sample from the pottery data set -
## this will be the "new" data to be predicted
smpl <- sample(1:nrow(pottery), 5)
test <- pottery[smpl, -7] # extract the test sample. Remove the last (grouping) variable
print(test)
## predict new data
pr <- predict(rs, newdata=test)
pr@classification
Run the code above in your browser using DataLab