Learn R Programming

grplassocat (version 1.0)

fit_grp: Function to fit a group lasso model to a standardized feature matrix

Description

Standardizes feature matrix including categorical features and fits a group lasso model

Usage

fit_grp(eqn, dat, lambda, model = LinReg(), nonpen = c(), standardize = TRUE, ...)

Arguments

eqn

formula of the penalized variables. The response has to be on the left hand side of ~. If interaction terms are included without main effects, the main effects will automatically be added by the package.

dat

data.frame, categorical features need to be of type factor

lambda

Penalty parameter (scalar)

model

an object of class grpl.model as defined in the package grplasso.

nonpen

formula of the nonpenalized features

standardize

logical. If true, the design matrix of the continuous features will be centered and standardized to unit norm

...

additional arguments to be passed to the grplasso function in the package of the same name.

Value

A dataframe containing the coefficients of the fitted group lasso model that have been re-scaled to the original scale of the data is returned. Coefficients of interaction terms for which no observations are included in dat are returned as NA.

Details

Design matrices of the categorical features and interactions between categorical features are centered and standardized by column-wise scaling. After fitting a group lasso model to the standardized desgin matrix, coefficients are re-scaled and centered to the original scale of the data. Interactions between categorical and continuous features are standardized by a singular value decomposition.

References

Detmer, Felicitas J., and Martin Slawski. "A Note on Coding and Standardization of Categorical Variables in (Sparse) Group Lasso Regression." arXiv preprint arXiv:1805.06915 (2018).

Examples

Run this code
# NOT RUN {
data(dattest)

#---set datatype of categorical features to factor=----
dattest$X1cut=as.factor(dattest$X1cut)
dattest$X2cut=as.factor(dattest$X2cut)
dattest$X3cut=as.factor(dattest$X3cut)

table(dattest[,c("X1cut", "X2cut", "X3cut")])

#--fit group lasso models
coefs1=fit_grp(y~X1cut * X2cut +X1cut * X3cut +X2cut * X3cut, dattest, lambda=0.5, model=LinReg())
coefs2=fit_grp(y~X1cut * X2cut +X1cut * X3cut +X2cut * X3cut, dattest, lambda=0.5, model=LinReg(),
               nonpen=~X1cut)
# }

Run the code above in your browser using DataLab