Fits a regularization path for large margin classifiers at a sequence of regularization parameters lambda.
gcdnet(
x,
y,
nlambda = 100,
method = c("hhsvm", "logit", "sqsvm", "ls", "er"),
lambda.factor = ifelse(nobs < nvars, 0.01, 1e-04),
lambda = NULL,
lambda2 = 0,
pf = rep(1, nvars),
pf2 = rep(1, nvars),
exclude,
dfmax = nvars + 1,
pmax = min(dfmax * 1.2, nvars),
standardize = FALSE,
intercept = TRUE,
eps = 1e-08,
maxit = 1e+06,
delta = 2,
omega = 0.5
)
An object with S3 class gcdnet
.
the call that produced this object
intercept sequence of length
length(lambda)
a p*length(lambda)
matrix of
coefficients, stored as a sparse matrix (dgCMatrix
class, the
standard class for sparse numeric matrices in the Matrix
package.).
To convert it into normal type matrix use as.matrix()
.
the actual sequence of lambda
values used
the number of nonzero coefficients for each value of
lambda
.
dimension of coefficient matrix (ices)
total number of iterations (the most inner loop) summed over all lambda values
error flag, for warnings and errors, 0 if no error.
matrix of predictors, of dimension \(N \times p\); each row is an observation vector.
response variable. This argument should be a two-level factor for classification.
the number of lambda
values - default is 100.
a character string specifying the loss function to use, valid options are:
"hhsvm"
Huberized squared hinge loss,
"sqsvm"
Squared hinge loss,
"logit"
logistic
loss,
"ls"
least square loss.
"er"
expectile
regression loss.
Default is "hhsvm"
.
The factor for getting the minimal lambda in
lambda
sequence, where min(lambda)
= lambda.factor
*
max(lambda)
, where max(lambda)
is the smallest value of
lambda
for which all coefficients are zero. The default depends on
the relationship between \(N\) (the number of rows in the matrix of
predictors) and \(p\) (the number of predictors). If \(N > p\), the
default is 0.0001
, close to zero. If \(N<p\), the default is
0.01
. A very small value of lambda.factor
will lead to a
saturated fit. It takes no effect if there is user-defined lambda
sequence.
a user supplied lambda
sequence. Typically, by leaving
this option unspecified users can have the program compute its own
lambda
sequence based on nlambda
and lambda.factor
.
Supplying a value of lambda
overrides this. It is better to supply
a decreasing sequence of lambda
values than a single (small) value,
if not, the program will sort user-defined lambda
sequence in
decreasing order automatically.
regularization parameter \(\lambda_2\) for the quadratic penalty of the coefficients.
L1 penalty factor of length \(p\) used for adaptive LASSO or
adaptive elastic net. Separate L1 penalty weights can be applied to each
coefficient of \(\beta\) to allow differential L1 shrinkage. Can
be 0 for some variables, which implies no L1 shrinkage, and results in
that variable always being included in the model. Default is 1 for all
variables (and implicitly infinity for variables listed in
exclude
).
L2 penalty factor of length \(p\) used for adaptive LASSO or adaptive elastic net. Separate L2 penalty weights can be applied to each coefficient of \(\beta\) to allow differential L2 shrinkage. Can be 0 for some variables, which implies no L2 shrinkage. Default is 1 for all variables.
indices of variables to be excluded from the model. Default is none. Equivalent to an infinite penalty factor.
limit the maximum number of variables in the model. Useful for very large \(p\), if a partial path is desired. Default is \(p+1\).
limit the maximum number of variables ever to be nonzero. For
example once \(\beta\) enters the model, no matter how many times it
exits or re-enters model through the path, it will be counted only once.
Default is min(dfmax*1.2,p)
.
logical flag for variable standardization, prior to
fitting the model sequence. If TRUE
, x
matrix is normalized
such that x
is centered (i.e.
\(\sum^N_{i=1}x_{ij}=0\)), and sum squares of each column
\(\sum^N_{i=1}x_{ij}^2/N=1\). If x
matrix is
standardized, the ending coefficients will be transformed back to the
original scale. Default is FALSE
.
logical flag to indicate whether to include or exclude the intercept in the model.
convergence threshold for coordinate majorization descent. Each
inner coordinate majorization descent loop continues until the relative
change in any coefficient (i.e., \(\max_j(\beta_j^{new}
-\beta_j^{old})^2\)) is less than
eps
. For HHSVM, i.e., method="hhsvm"
, it is
\(\frac{2}{\delta}\max_j(\beta_j^{new}-\beta_j^{old})^2\). For expectile regression,
i.e., method="er"
, it is \(2\max(1-\omega,\omega)
\max_j(\beta_j^{new}-\beta_j^{old})^2\). Defaults value is 1e-8
.
maximum number of outer-loop iterations allowed at fixed lambda
value. Default is 1e6. If models do not converge, consider increasing
maxit
.
the parameter \(\delta\) in the HHSVM model. The value must be greater than 0. Default is 2.
the parameter \(\omega\) in the expectile regression model. The value must be in (0,1). Default is 0.5.
Yi Yang, Yuwen Gu and Hui Zou
Maintainer: Yi Yang <yi.yang6@mcgill.ca>
Note that the objective function in gcdnet
is $$Loss(y,
X, \beta)/N + \lambda_1\Vert\beta\Vert_1 +
0.5\lambda_2\Vert\beta\Vert_2^2$$ where the penalty is a combination of L1 and L2
term. Users can specify the loss function to use, options include
Huberized squared hinge loss, Squared hinge loss, least square loss,
logistic regression and expectile regression loss. Users can also tweak
the penalty by choosing different \(lambda2\) and penalty factor.
For computing speed reason, if models are not converging or running slow,
consider increasing eps
, decreasing nlambda
, or increasing
lambda.factor
before increasing maxit
.
FAQ:
Question: “I couldn't get an idea how to specify an option to get adaptive LASSO, how to specify an option to get elastic net and adaptive elastic net? Could you please give me a quick hint?”
Answer: lambda2
is the regularize parameter for L2 penalty
part. To use LASSO, set lambda2=0
. To use elastic net, set
lambda2
as nonzero.
pf
is the L1 penalty factor of length \(p\) (\(p\) is the
number of predictors). Separate L1 penalty weights can be applied to each
coefficient to allow differential L1 shrinkage. Similiarly pf2
is
the L2 penalty factor of length \(p\).
To use adaptive LASSO, you should set lambda2=0
and also specify
pf
and pf2
. To use adaptive elastic net, you should set
lambda2
as nonzero and specify pf
and pf2
,
For example:
library('gcdnet')
# Dataset N = 100, p = 10
x_log <- matrix(rnorm(100*10),100,10)
y_log <- sample(c(-1,1),100,replace=TRUE) # LASSO
m <- gcdnet(x=x_log,y=y_log,lambda2=0,method="log")
plot(m)
# elastic net with lambda2 = 1
m <- gcdnet(x=x_log,y=y_log,lambda2=1,method="log")
plot(m)
# adaptive lasso with penalty factor
# pf = 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0
m <- gcdnet(x=x_log,y=y_log,lambda2=0,method="log",
pf=c(rep(0.5,5),rep(1,5)))
plot(m)
# adaptive elastic net with lambda2 = 1 and penalty factor pf =
# c(rep(0.5,5),rep(1,5)) pf2 = 3 3 3 3 3 1 1 1 1 1
m <- gcdnet(x=x_log,y=y_log,lambda2=1,method="log",
pf=c(rep(0.5,5),rep(1,5)),
pf2 = c(rep(3,5),rep(1,5)))
plot(m)
Question: “what is the meaning of the parameter
pf
? On the package documentation, it said pf
is the penalty
weight applied to each coefficient of beta?”
Answer: Yes, pf
and pf2
are L1 and L2 penalty factor
of length \(p\) used for adaptive LASSO or adaptive elastic net. 0
means that the feature (variable) is always excluded, 1 means that the
feature (variable) is included with weight 1.
Question: “Does gcdnet deal with both continuous and categorical response variables?”
Answer: Yes, both are supported, you can use a continuous type response variable with the least squares regression loss, or a categorical type response with losses for classification problem.
Question: “Why does predict function not work? predict should return the predicted probability of the positive class. Instead I get:”
Error in as.matrix(as.matrix(cbind2(1, newx)) %*% nbeta):
error in evaluating the argument 'x' in selecting a method for function 'as.matrix':
Error in t(.Call(Csparse_dense_crossprod, y, t(x))):
error in evaluating the argument 'x' in selecting a method for function 't':
Error: Cholmod error 'X and/or Y have wrong dimensions' at
file ../MatrixOps/cholmod_sdmult.c, line 90?
“Using the Arcene dataset and executing the following code will give the above error:”
library(gcdnet)
arc <- read.csv("arcene.csv", header=FALSE)
fit <- gcdnet(arc[,-10001], arc[,10001], standardize=FALSE,
method="logit")
pred <- rnorm(10000)
predict(fit, pred, type="link")
Answer: It is actually NOT a bug of gcdnet. When make prediction using a new matrix x, each observation of x should be arranged as a row of a matrix. In your code, because "pred" is a vector, you need to convert "pred" into a matrix, try the following code:
pred <- rnorm(10000)
pred <- matrix(pred,1,10000)
predict(fit, pred, type="link")
Yang, Y. and Zou, H. (2012).
"An Efficient Algorithm for Computing The HHSVM and Its Generalizations."
Journal of Computational and Graphical Statistics, 22, 396-415.
BugReport: https://github.com/emeryyi/gcdnet
Gu, Y., and Zou, H. (2016).
"High-dimensional generalizations of asymmetric least squares regression and their applications."
The Annals of Statistics, 44(6), 2661–2694.
plot.gcdnet
data(FHT)
# 1. solution paths for the LASSO penalized least squares.
# To use LASSO set lambda2 = 0.
m1 <- gcdnet(x = FHT$x, y = FHT$y_reg, lambda2 = 0, method = "ls")
plot(m1)
# 2. solution paths for the elastic net penalized HHSVM.
# lambda2 is the parameter controlling the L2 penalty.
m2 <- gcdnet(x = FHT$x, y = FHT$y, delta = 1, lambda2 = 1, method = "hhsvm")
plot(m2)
# 3. solution paths for the adaptive LASSO penalized SVM
# with the squared hinge loss. To use the adaptive LASSO,
# set lambda2 = 0 and meanwhile specify the L1 penalty weights.
p <- ncol(FHT$x)
# set the first three L1 penalty weights as 0.1 and the rest are 1
pf = c(0.1, 0.1, 0.1, rep(1, p-3))
m3 <- gcdnet(x = FHT$x, y = FHT$y, pf = pf, lambda2 = 0, method = "sqsvm")
plot(m3)
# 4. solution paths for the adaptive elastic net penalized
# logistic regression.
p <- ncol(FHT$x)
# set the first three L1 penalty weights as 10 and the rest are 1.
pf <- c(10, 10, 10, rep(1, p-3))
# set the last three L2 penalty weights as 0.1 and the rest are 1.
pf2 <- c(rep(1, p-3), 0.1, 0.1, 0.1)
# set the L2 penalty parameter lambda2=0.01.
m4 <- gcdnet(x = FHT$x, y = FHT$y, pf = pf, pf2 = pf2,
lambda2 = 0.01, method = "logit")
plot(m4)
# 5. solution paths for the LASSO penalized expectile regression
# with the asymmetric least square parameter omega=0.9.
m5 <- gcdnet(x = FHT$x, y = FHT$y_reg, omega = 0.9,
lambda2 = 0, method = "er")
plot(m5)
Run the code above in your browser using DataLab