Performs k-fold cross-validation in order to evaluate the performance and/or select an optimal smoothing parameter of a penalized regression model with ordinal predictors.
ordCV(x, y, u = NULL, z = NULL, k=5, lambda, offset = rep(0,length(y)),
model = c("linear", "logit", "poisson", "cumulative"),
type=c("selection", "fusion"), ...)
Returns a list containing the following components:
matrix of size (k
\(x\) length(lambda)
) containing brier/deviance scores on the training data.
Brier/deviance score matrix when looking at the test data set.
matrix of integers 1,2,... giving the observed levels of the ordinal factor(s).
the vector of response values.
a matrix (or data.frame
) of additional categorical (nominal)
predictors, with each column corresponding to one (additional) predictor and
containing numeric values from {1,2,...}; corresponding dummy coefficients
will not be penalized, and for each covariate category 1 is taken as reference category. Currently not supported if model="cumulative"
.
a matrix (or data.frame
) of additional metric predictors, with
each column corresponding to one (additional) predictor; corresponding
coefficients will not be penalized. Currently not supported if model="cumulative"
.
number of folds.
vector of penalty parameters (in decreasing order).
vector of offset values.
the model which is to be fitted. Possible choices are "linear" (default), "logit", "poisson" or "cumulative". See details below.
penalty to be applied. If "selection", group lasso penalty for smoothing and selection is used. If "fusion", a fused lasso penalty for fusion and selection is used.
additional arguments to ordFusion
and
ordSelect
, respectively.
Aisouda Hoshiyar
The method assumes that categorical covariates (contained in x
and
u
) take values 1,2,...,max, where max denotes the (columnwise) highest
level observed in the data. If any level between 1 and max is not observed for an ordinal predictor,
a corresponding (dummy) coefficient is fitted anyway. If any level > max is
not observed but possible in principle, and a corresponding coefficient is to
be fitted, the easiest way is to add a corresponding row to x
(and
u
,z
) with corresponding y
value being NA
.
If a linear regression model is fitted, response vector y
may contain
any numeric values; if a logit model is fitted, y
has to be 0/1 coded;
if a poisson model is fitted, y
has to contain count data. If a cumulative logit model is fitted, y
takes values 1,2,...,max.
For the cumulative model, the measure of performance used by the function is the brier score, being the sum of squared differences between (indicator) outcome and predicted probabilities \(P(Y_i=r)=P(y_{ir})=\pi_{ir}\), with observations \(i=1,...,n\) and classes \(r=1,...,c\). Otherwise, the deviance is used.
Hoshiyar, A., Gertheiss, L.H., and Gertheiss, J. (2023). Regularization and Model Selection for Item-on-Items Regression with Applications to Food Products' Survey Data. Preprint, available from https://arxiv.org/abs/2309.16373.
ordSelect
, ordFusion