- x
n
by p
matrix of numeric predictors.
- y
vector of response values of length n
.
For binary classification, y
should be a factor with 2 levels.
- standardize
whether to standardize the x
variables prior to fitting
the PENSE estimates. Can also be set to "cv_only"
, in which case the
input data is not standardized, but the training data in the CV folds is
scaled to match the scaling of the input data.
Coefficients are always returned on the original scale.
This can fail for variables with a large proportion of a single value
(e.g., zero-inflated data).
In this case, either compute with standardize = FALSE
or standardize
the data manually.
- lambda
optional user-supplied sequence of penalization levels. If given and not NULL
,
nlambda
and lambda_min_ratio
are ignored.
- cv_k
number of folds per cross-validation.
- cv_repl
number of cross-validation replications.
- cv_metric
either a string specifying the performance metric to use, or a function to
evaluate prediction errors in a single CV replication.
If a function, the number of arguments define the data the function receives.
If the function takes a single argument, it is called with a single numeric vector of
prediction errors.
If the function takes two or more arguments, it is called with the predicted values as
first argument and the true values as second argument.
The function must always return a single numeric value quantifying the prediction performance.
The order of the given values corresponds to the order in the input data.
- fit_all
If TRUE
, fit the model for all penalization levels.
Can also be any combination of "min"
and "{x}-se"
, in which case only models at the
penalization level with smallest average CV accuracy, or within {x}
standard errors,
respectively.
Setting fit_all
to FALSE
is equivalent to "min"
.
Applies to all alpha
value.
- fold_starts
how to determine starting values in the
cross-validation folds. If "full"
(default), use the best solution from
the fit to the full data as starting value. This implies
fit_all=TRUE
.
If "enpy"
compute separate ENPY initial estimates in each fold.
The option "both"
uses both.
These starts are in addition to the starts provided in other_starts
.
- cl
a parallel cluster. Can only be used in combination with
ncores = 1
.
- ...
Arguments passed on to pense
nlambda
number of penalization levels.
lambda_min_ratio
Smallest value of the penalization level as a fraction of the largest
level (i.e., the smallest value for which all coefficients are zero). The default depends on
the sample size relative to the number of variables and alpha
. If more observations than
variables are available, the default is 1e-3 * alpha
, otherwise 1e-2 * alpha
.
nlambda_enpy
number of penalization levels where the EN-PY initial estimate is computed.
penalty_loadings
a vector of positive penalty loadings (a.k.a. weights) for different
penalization of each coefficient. Only allowed for alpha
> 0.
enpy_lambda
optional user-supplied sequence of penalization levels at which EN-PY
initial estimates are computed. If given and not NULL
, nlambda_enpy
is ignored.
other_starts
a list of other staring points, created by starting_point()
.
If the output of enpy_initial_estimates()
is given, the starting points will be shared
among all penalization levels.
Note that if a the starting point is specific to a penalization level, this penalization
level is added to the grid of penalization levels (either the manually specified grid in
lambda
or the automatically generated grid of size nlambda
).
If standardize = TRUE
, the starting points are also scaled.
intercept
include an intercept in the model.
bdp
desired breakdown point of the estimator, between 0.05 and 0.5. The actual
breakdown point may be slightly larger/smaller to avoid instabilities of the S-loss.
cc
tuning constant for the S-estimator. Default is chosen based on the breakdown
point bdp
. This affects the estimated coefficients only if
standardize=TRUE
. Otherwise only the estimated scale of the residuals
would be affected.
eps
numerical tolerance.
explore_solutions
number of solutions to compute up to the desired precision eps
.
explore_tol,explore_it
numerical tolerance and maximum number of iterations for
exploring possible solutions. The tolerance should be (much) looser than eps
to be useful,
and the number of iterations should also be much smaller than the maximum number of
iterations given via algorithm_opts
.
max_solutions
only retain up to max_solutions
unique solutions per penalization level.
comparison_tol
numeric tolerance to determine if two solutions are equal.
The comparison is first done on the absolute difference in the value of the objective
function at the solution If this is less than comparison_tol
, two solutions are deemed
equal if the squared difference of the intercepts is less than comparison_tol
and the
squared \(L_2\) norm of the difference vector is less than comparison_tol
.
add_zero_based
also consider the 0-based regularization path. See details for a
description.
enpy_specific
use the EN-PY initial estimates only at the penalization level they
are computed for. See details for a description.
carry_forward
carry the best solutions forward to the next penalty
level.
sparse
use sparse coefficient vectors.
ncores
number of CPU cores to use in parallel. By default, only one CPU core is used.
Not supported on all platforms, in which case a warning is given.
algorithm_opts
options for the MM algorithm to compute the estimates.
See mm_algorithm_options()
for details.
mscale_opts
options for the M-scale estimation. See mscale_algorithm_options()
for details.
enpy_opts
options for the ENPY initial estimates, created with the
enpy_options()
function. See enpy_initial_estimates()
for details.
cv_k,cv_objective
deprecated and ignored. See pense_cv()
for estimating
prediction performance via cross-validation.