gvcm.cat(formula, data, family = gaussian, method = c("lqa", "AIC", "BIC"),
tuning = list(lambda=TRUE, specific=FALSE, phi=0.5, grouped.fused=0.5,
elastic=0.5, vs=0.5, spl=0.5), weights, offset, start, control,
model = FALSE, x = FALSE, y = FALSE, plot=FALSE, ...)
pest(x, y, indices, family = gaussian,
tuning = list(lambda=TRUE, specific=FALSE, phi=0.5, grouped.fused=0.5,
elastic=0.5, vs=0.5, spl=0.5), weights, offset, start = NULL,
control = cat_control(), plot=FALSE, ...)
abc(x, y, indices, family = gaussian, tuning = c("AIC", "BIC"),
weights, offset, start, control = cat_control(), plot=FALSE, ...)
formula
: a symbolic description of the model to be fitted. See detailsfamily
object describing the error distribution and link function to be used in the model;
this can be a character string naming a family function, a family function or the result of a call to a family function,
see family
for details; currently only gaussian
, binomial
, poisson
, Gamma
are working"lqa"
, "AIC"
or "BIC"
; method "lqa"
induces penalized estimation;
it employs a PIRLS-algorithm (see Fan and Li, 2001; Oelker and Tutz, 2013).
Methods "AIC"
and "BIC"
employ a forward selection strategylambda
is the scalar, overall penalty parameter;
if lambda
is a vector of values, these values are cross-validated;
if lambda = TRUE
, lambda
is cross-validated on log scale between lambda.lower
and lambda.upper
; see cat_control
.
If lambda
is a vector with the same length as elements in the formula
and if specific
equals a vector of proper length, the entries of specific
are interpreted as specific tuning parameters for each entry of the formula.
phi
, grouped.fused
, elastic
, vs
and spl
are parameters that weigh the terms of some penalties;
must be out of intervall )0,1(; the default 0.5
corresponds to equal weights
lqa
cat_control()
; see cat_control
gvcm.cat
: a logical value indicating whether the employed model frame shall be returned or notgvcm.cat
: logical values indicating whether the response vector and the model matrix used in the fitting process shall be returned or not;
for functions pest
and abc
: y
must be a response vector, x
a proper coded design matrixTRUE
, estimates needed to plot coefficient paths are computedpest
and abc
only: the to be used index argument; see function index
gvcm.cat
returns an object of class gvcm.cat
which inherits from class glm
which inherits from class lm
.
An object of class gvcm.cat
contains:
coefficients.reduced
method="lqa"
estimated by the trace of the generalized head matrix; for methods "AIC"
, "BIC"
estimated like default in glm.fit
family
object useddeviance
; the null model includes a non-varying intercept onlyrank
control
argument usedmodel.frame
on the special handling of NA
s; currently always na.omit
plot=TRUE
, the first matrix contains estimates needed to plot coefficient paths;
if lambda
was cross-validated, the second matrix contains the cross-validation scoreslambda
was cross-validated, the optimal value is returnedindex
x
into its reduced version; e.g. needed for refittingcoefficients
into its reduced versionformula
suppliedterms
object usedmethod
qr
, R
and effects
relating to the final weighted linear fit.
A typical formula
has the form response ~ 1 + terms
; where response
is the response vector and terms
is a series of terms which specifies a linear predictor.
There are some special terms for regularized terms:
v(x, u, n="L1", bj=TRUE)
: varying coefficients enter the formula
as v(x,u)
where u
denotes the categorical effect modifier and x
the modfied covariate.
A varying intercept is denoted by v(1,u)
. Varying coefficients with categorical effect modifiers are penalized as described in Oelker et. al. 2012.
The argument bj
and the element phi
in argument tuning
allow for the described weights.
p(u, n="L1")
: ordinal/nominal covariates u
given as p(u)
are penalized as described in Gertheiss and Tutz (2010). For numeric covariates, p(u)
indicates a pure Lasso penalty.
grouped(u, ...)
: penalizes a group of covariates with the grouped Lasso penalty of Yuan and Lin (2006); so far, working for categorical covariates only
sp(x, knots=20, n="L2")
: implents a continuous x
covariate non-parametrically as $f(x)$; $f(x)$ is represented by centered evaluations of basis functions (cubic B-splines with number of knots = knots
); for n="L2"
, the curvature of $f(x)$ is penalized by a Ridge penalty; see Eilers and Marx (1996)
SCAD(u)
: penalizes a covariate u
with the SCAD penalty by Fan and Li (2001); for categorical covariates u
, differences of coefficients are penalized by a SCAD penalty, see Gertheiss and Tutz (2010)
elastic(u)
: penalizes a covariate u
with the elastic net penalty by Zou and Hastie (2005); for categorical covariates u
, differences of coefficients are penalized by the elastic net penalty, see Gertheiss and Tutz (2010)
If the formula
contains no (varying) intercept, gvcm.cat
assumes a constant intercept. There is no way to avoid an intercept.
For specials p
and v
, there is the special argument n
:
if n="L1"
, the absolute values in the penalty are replaced by squares of the same terms;
if n="L2"
, the absolute values in the penalty are replaced by quadratic, Ridge-type terms;
if n="L0"
, the absolute values in the penalty are replaced by an indicator for non-zero entries of the same terms.
For methods "AIC"
and "BIC"
, the coefficients are not penalized but selected by a forward selection strategy whenever it makes sense;
for special v(x,u)
, the selection strategy is described in Oelker et. al. 2012; the approach for the other specials corresponds to this idea.
For binomial families the response can also be a success/failure rate or a two-column matrix with the columns giving the numbers of successes and failures.
Function pest
computes penalized estimates, that is, it implements method "lqa"
(PIRLS-algorithm).
Function abc
implements the forward selection strategy employing AIC/BIC.
Categorical effect modifiers and penalized categorical covariates are dummy coded as required by the penalty. If x
in v(x,u)
is binary, it is effect coded (first category refers to -1). Other covariates are coded like given by getOption
.
There is a summary function: summary.gvcm.cat
index
, cat_control
, plot.gvcm.cat
, predict.gvcm.cat
, simulation
## example for function simulation()
covariates <- list(x1=list("unif", c(0,2)),
x2=list("unif", c(0,2)),
x3=list("unif", c(0,2)),
u=list("multinom",c(0.3,0.4,0.3), "nominal")
)
true.f <- y ~ 1 + v(x1,u) + x2
true.coefs <- c(0.2, 0.3,.7,.7, -.5)
data <- simulation(400, covariates, NULL, true.f, true.coefs , binomial(), seed=456)
## example for function gvcm.cat()
f <- y ~ v(1,u) + v(x1,u) + v(x2,u)
m1 <- gvcm.cat(f, data, binomial(), plot=TRUE, control=cat_control(lambda.upper=19))
summary(m1)
## example for function predict.gvcm.cat
newdata <- simulation(200, covariates, NULL, true.f, true.coefs , binomial(), seed=789)
prediction <- predict.gvcm.cat(m1, newdata)
## example for function plot.gvcm.cat
plot(m1)
plot(m1, type="score")
plot(m1, type="coefs")
Run the code above in your browser using DataCamp Workspace