
grpregOverlap(X, y, group, penalty=c("grLasso", "grMCP", "grSCAD", "gel", "cMCP", "gLasso", "gMCP"), family=c("gaussian","binomial", "poisson", "cox"), nlambda=100, lambda, lambda.min={if (nrow(X) > ncol(X)) 1e-4 else .05}, alpha=1, eps=.001, max.iter=1000, dfmax=ncol(X), gmax=length(group), gamma=ifelse(penalty == "grSCAD", 4, 3), tau=1/3, group.multiplier, returnX = FALSE, returnOverlap = FALSE, warn=TRUE, ...)
grpregOverlap
calls
grpreg
, which standardizes the data and includes an intercept by default.
y
is the time-to-event outcome - a two-column matrix or
Surv
object. The first column is the
time on study (follow up time); the second column is a binary
variable with 1 indicating that the event has occurred and 0
indicating (right) censoring. See grpreg
and grpsurv
for more details.
grpreg
, group
here must be a list of vectors,
each containing integer indices or character names of variables in the group.
variables that not belong to any groups will be disgarded.
grLasso
, grMCP
, or
grSCAD
for group selection. Or specify gel
or cMCP
for
bi-level selection, i.e., selecting important groups as well as important
variables in those groups. See grpreg
for more details.
family
is missing, it is set to be 'gaussian'. Specify family
= 'cox' for survival analysis (Cox models).
lambda
values. Default is 100.
lambda
values. Typically, this is left
unspecified, and the function automatically computes a grid of lambda values
that ranges uniformly on the log scale over the relevant range of lambda values.
lambda
, as a fraction of lambda.max
.
Default is .0001 if the number of observations is larger than the number of
covariates and .05 otherwise.
grpreg
, the L2 (ridge) penalty is also allowed along with
the group penalty. alpha
controls the proportional weight of the
regularization parameters of these two penalties. The regularization parameter
of the group penalty is lambda*alpha
, while that of the ridge penalty is
lambda*(1-alpha)
. Default is 1: no L2 penalty.
eps
. Default is .001
.
grpreg
for more details.
grpreg
for more details.
"grpregOverlap"
or "grpsurvOverlap"
(for Cox models), which inherits "grpreg"
,
with following variables.grpregOverlap
takes input design matrix X
and grouping information
group
, and expands X to the new, non-overlapping space. It then calls
grpreg
for modeling fitting based on group decent algorithm. Unlike
in grpreg
, the interface for group bridge-penalized method is not implemented.
The expanded design matrix is named X.latent
. It is a returned value in the fitted
object, provided returnX
is TRUE. The latent coeffecient (or norm) vector then
corresponds to that. Note thaT when constructing X.latent
, the columns in X
corresponding to those variables not included in group
will be removed automatically.
For more detailed explanation for the penalties and algorithm, see grpreg
.
cv.grpregOverlap
, cv.grpsurvOverlap
, plot
,
select
, grpreg
, grpsurv
.
## linear regression, a simulation demo.
set.seed(123)
group <- list(gr1 = c(1, 2, 3), gr2 = c(1, 4), gr3 = c(2, 4, 5),
gr4 = c(3, 5), gr5 = c(6))
beta.latent.T <- c(5, 5, 5, 0, 0, 0, 0, 0, 5, 5, 0) # true latent coefficients.
# beta.T <- c(5, 5, 10, 0, 5, 0), true variables: 1, 2, 3, 5; true groups: 1, 4.
X <- matrix(rnorm(n = 6*100), ncol = 6)
X.latent <- expandX(X, group)
y <- X.latent %*% beta.latent.T + rnorm(100)
fit <- grpregOverlap(X, y, group, penalty = 'grLasso')
# fit <- grpregOverlap(X, y, group, penalty = 'grMCP')
# fit <- grpregOverlap(X, y, group, penalty = 'grSCAD')
head(coef(fit, latent = TRUE)) # compare to beta.latent.T
plot(fit, latent = TRUE)
head(coef(fit, latent = FALSE)) # compare to beta.T
plot(fit, latent = FALSE)
cvfit <- cv.grpregOverlap(X, y, group, penalty = 'grMCP')
plot(cvfit)
head(coef(cvfit))
summary(cvfit)
## logistic regression, real data, pathway selection
data(pathway.dat)
X <- pathway.dat$expression
group <- pathway.dat$pathways
y <- pathway.dat$mutation
fit <- grpregOverlap(X, y, group, penalty = 'grLasso', family = 'binomial')
plot(fit)
str(select(fit))
str(select(fit,criterion="AIC",df="active"))
## Not run:
# cvfit <- cv.grpregOverlap(X, y, group, penalty = 'grLasso', family = 'binomial')
# coef(cvfit)
# predict(cvfit, X, type='response')
# predict(cvfit, X, type = 'class')
# plot(cvfit)
# plot(cvfit, type = 'all')
# summary(cvfit)
# ## End(Not run)
Run the code above in your browser using DataLab