Learn R Programming

gvcm.cat (version 1.3)

gvcm.cat: Regularized Categorial Effects/Categorial Effect Modifiers in GLMs

Description

The function fits generalized linear models with categorial effects and categorial effect modifiers. The model is specified by giving a symbolic description of the linear predictor and a description of the error distribution. Estimation employs regularization and model selection strategies to fuse and/or select a covariate's categories. These strategies are either a penalty combining the fused and the pure Lasso or a forward selection strategy employing AIC/BIC (see Oelker et. al. 2012).

Usage

gvcm.cat(formula, data, family = gaussian, method = "lqa", 
tuning = list(lambda=TRUE, phi=0.5), weights, control, 
model = FALSE, x = FALSE, y = FALSE, plot=FALSE, ...)

## S3 method for class 'default':
gvcm.cat(formula, data, family = gaussian, method = "lqa", 
tuning = list(lambda=TRUE, phi=0.5), weights, control, 
model = FALSE, x = FALSE, y = FALSE, plot=FALSE, ...)
 
pest(X, y, indices, family = gaussian, method = c("lqa","nlm"), 
tuning = list(lambda=TRUE, phi=0.5), weights, 
control = cat_control(), plot=FALSE, ...)

abc(X, y, indices, family = gaussian, method = c("AIC", "BIC"), 
weights, control = cat_control(), plot=FALSE, ...)

Arguments

formula
an object of class formula: a symbolic description of the model to be fitted. See details
data
an optional data frame, containing the variables in the model.
family
a family object describing the error distribution and link function to be used in the model; this can be a character string naming a family function, a family function or the result of a call to a family function, see
method
fitting method; one out of "lqa", "nlm", "AIC" or "BIC"; methods "lqa" and "nlm" induce penalized estimation; the default method "lqa" employs a PIRLS
tuning
a list; tuning parameters for penalized estimation; lambda is the overall penalty parameter; phi weights the penalty terms of varying coefficients, it must be out of intervall )0,1(, the default 0.5 corresponds to eq
weights
an optional weight vector
control
a list of parameters for controlling the fitting process; if emtpy, set to cat_control(); see cat_control
model
a logical value indicating whether the employed model frame shall be returned or not
x, y
for gvcm.cat: logical values indicating whether the response vector and model matrix used in the fitting process shall be returned or not; for pest and abc: y must be a response vector
X
only for pest and abc: a proper coded design matrix
plot
logical; if TRUE, estimates for path-plotting are computed
indices
for pest and abc only: the to be used index arguments; see function index
...
further arguments passed to or from other methods

Value

  • gvcm.cat returns an object of class "gvcm.cat" which inherits from class glm which inherits from class lm. An object of class gvcm.cat contains:
  • coefficientsnamed vector of coefficients
  • coefficients.reducedreduced vector of coefficients; selected coefficients/differences of coefficients are set to zero
  • coefficients.refittedrefitted vector of coefficients; i.e. maximum likelihood estimate of that model containing selected covariates only; same length as coefficients.reduced
  • coefficients.omlmaximum likelihood estimate of the full model
  • residualsdeviance residuals
  • fitted.valuesfitted mean values
  • rankdegrees of freedom model; for method="lqa" estimated by the trace of the generalized head matrix; for method="nlm" the estimate is the number of selected coefficients; for methods "AIC", "BIC" estimated like default in glm.fit
  • familythe family object used
  • linear.predictorslinear fit on link scale
  • deviancescaled deviance
  • aica version of Akaike's Information Criterion; minus twice the maximized log-likelihood plus twice the rank. For binomial and Poison families the dispersion is fixed at one. For a gaussian family the dispersion is estimated from the residual deviance, and the number of parameters is the rank plus one.
  • null.deviancethe deviance for the null model, comparable with deviance; the null model includes a non-varying intercept only
  • iternumber of iterations
  • weightsworking weights of the final iteration
  • df.residualthe residual degrees of freedom/degrees of freedom error; computed like rank
  • df.nullthe residual degrees of freedom for the null model
  • convergedlogical; fulfills the PIRLS-algorithm the given convergence conditions?
  • boundarylogical; is the fitted value on the boundary of the attainable values?
  • offsetthe offset vector used
  • controlthe value of the control argument used
  • methodsame as input argument method
  • contraststhe contrasts used
  • na.actioninformation returned by model.frame on the special handling of NAs; currently always na.omit
  • plotif input plot=TRUE, a list containig two matrixes for plotting
  • tuninga list, employed values of lambda and phi; if lambda and/or phi were cross-validated, these are the optimal values
  • indicesused index argument; see function index
  • number.selectable.parametersnumber of coefficients that could be selected
  • number.removed.parametersnumber of actual removed coefficients
  • x.reductiona matrix; transforms model frame x into its reduced version; e.g. needed for refitting
  • callthe matched call
  • formulathe formula supplied
  • termsthe terms object used
  • datathe data argument
  • x, yif requested, the model matrix/the response vector
  • modelif requested, the model frame
  • xlevelsa record of the levels of the factors used in fitting

Details

A typical formula has the form response ~ 1 + terms; where response is the response vector and terms is a series of terms which specifies a linear predictor. Varying coefficients enter the formula as v(x,u) where u denotes the categorial effect modifier and x the modfied covariate. For methods "lqa" and "nlm", these coefficients are penalized as described in Oelker et. al. 2012 (for weighting see phi in argument tuning); for methods "AIC" and "BIC", they are selected by a forward selection method as described in Oelker et. al. 2012. A varying intercept is denoted by v(1,u). If the formula contains no (varying) intercept, gvcm.cat assumes a constant intercept. There is no way to avoid an intercept. Ordinal/nominal covariates u given as p(u) are penalized as described in Gertheiss and Tutz (2010) or selected by the same forward selection strategy as v(x,u). For numeric covariates, p(u) indicates a pure Lasso penalty. For binomial families the response can also be a success/failure rate or a two-column matrix with the columns giving the numbers of successes and failures. Function pest computes penalized estimates, that is, it implements methods "lqa" (PIRLS-algorithm) and "nlm". Function abc implements the forward selection strategy employing AIC/BIC. Categorial effect modifiers and penalized categorial covariates are dummy coded as required by the penalty. If x in v(x,u) is binary, it is effect coded (first category refers to -1). Other covariates are coded like given by getOption.

References

Fan, J. and R. Li (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96(456), 1348-1360. Gertheiss, J. and G. Tutz (2010). Sparse modeling of categorial explanatory variables. The Annals of Statistics 4(4), 2150-2180. Oelker, M.-R., J. Gertheiss and G. Tutz (2012). Regularization and Model Selection with Categorial Predictors and Effect Modifiers in Generalized Linear Models. Department of Statistics at the University of Munich: Technical Report 122. Ulbricht, J. (2010). Variable Selection in Generalized Linear Models. Dissertation, Department of Statistics, University of Munich: Verlag Dr. Hut.

See Also

Functions index, cat_control, plot.gvcm.cat, predict.gvcm.cat, simulation

Examples

Run this code
## continues example of function simulation 
f <- y ~ v(1,u) + v(x1,u) + v(x2,u)
m1 <- gvcm.cat(f, data, binomial(), plot=TRUE)
summary(m1)

Run the code above in your browser using DataLab