Usage
sprmdaCV(formula, data, as, etas, nfold = 10, fun = "Hampel",
probp1 = 0.95, hampelp2 = 0.975, hampelp3 = 0.999, probp4=0.01, yweights = TRUE,
class = c("regfit", "lda"), prior = c(0.5, 0.5), center = "median", scale = "qn",
print = FALSE, plot = TRUE, numit = 100, prec = 0.01)
Arguments
formula
a formula, e.g. group ~ X1 + X2 with group a factor with two levels or a numeric vector coding
class membership with 1 and -1 and X1,X2 numeric variables.
data
a data frame or list which contains the variables given in formula. The response specified in the
formula needs to be a numeric vector coding the class membership with 1 and-1 or
a vector of factors with two levels.
as
a vector with positive integers, which are the number of SPRM components to be estimated in the models.
etas
vector of values for the tuning parameter for the sparsity. Values have to between 0 and 1.
nfold
the number of folds used for cross validation, default is nford=10
for 10-fold CV.
fun
an internal weighting function for case weights. Choices are "Hampel"
(preferred), "Huber"
or "Fair"
.
probp1
the 1-alpha value at which to set the first outlier cutoff for the weighting function.
hampelp2
the 1-alpha values for second cutoff. Only applies to fun="Hampel"
.
hampelp3
the 1-alpha values for third cutoff. Only applies to fun="Hampel"
.
probp4
a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if yweights=FALSE
.
yweights
logical; if TRUE weights are calculated for observations with potentially wrong class labels.
class
type of classification; choices are "regfit" or "lda". If "regfit" an object of class prm is returned.
prior
vector of length 2 with proir probabilities of the groups; only used if class="lda".
center
type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".
scale
type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.
print
logical, default is FALSE
. If TRUE
the variables included in each component are reported.
plot
logical, default is TRUE
. If TRUE
two contour plots are generated for number of components and sparsity parameter. The first contour plot shows
the mean weighted misclassification rate (see Details) the second the number of variables in the model.
numit
the number of maximal iterations for the convergence of the coefficient estimates.
prec
a value for the precision of estimation of the coefficients.