Learn R Programming

mpath (version 0.1-20)

zipath: Fit zero-inflated count data linear model with lasso (or elastic net), snet or mnet regularization

Description

Fit zero-inflated regression models for count data via penalized maximum likelihood.

Usage

zipath(formula, data, weights, subset, na.action, offset,
standardize = TRUE, family = c("poisson", "negbin","geometric"), 
link = c("logit", "probit", "cloglog", "cauchit", "log"), 
penalty = c("enet", "mnet", "snet"), start = NULL, model = TRUE, 
y = TRUE, x = FALSE, nlambda = 100, lambda.count = NULL,lambda.zero = NULL, 
 penalty.factor.count=NULL, penalty.factor.zero=NULL,
lambda.count.min.ratio = .0001, lambda.zero.min.ratio = .1, 
alpha.count = 1, alpha.zero = alpha.count, gamma.count = 3, 
gamma.zero = gamma.count, rescale=FALSE, init.theta, theta.fixed=FALSE,
EM = TRUE, maxit.em=200, convtype=c("count", "both"), maxit = 1000, 
maxit.theta = 1, reltol = 1e-5, eps.bino=1e-5, shortlist=FALSE, trace = FALSE, ...)

Arguments

formula
symbolic description of the model, see details.
weights
optional numeric vector of weights.
data, subset, na.action
arguments controlling formula processing via model.frame.
offset
optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets.
standardize
Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE.
family
character specification of count model family (a log link is always used).
link
character specification of link function in the binary zero-inflation model (a binomial family is always used).
model, y, x
logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned.
penalty
penalty considered as one of enet, mnet, snet.
start
starting values for the parameters in the linear predictor.
nlambda
number of lambda value, default value is 100. The sequence may be truncated before nlambda is reached if a close to saturated model for the zero component is fitted.
lambda.count
A user supplied lambda.count sequence. Typical usage is to have the program compute its own lambda.count and lambda.zero sequence based on nlambda and lambda.min.ratio.
lambda.zero
A user supplied lambda.zero sequence.
penalty.factor.count, penalty.factor.zero
These are numeric vectors with the same length as predictor variables. that multiply lambda.count, lambda.zero, respectively, to allow differential shrinkage of coefficients. Can be 0 for some variables, which implies no shrinkage, and th
lambda.count.min.ratio, lambda.zero.min.ratio
Smallest value for lambda.count and lambda.zero, respectively, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero except the interce
alpha.count
The elastic net mixing parameter for the count part of model.
alpha.zero
The elastic net mixing parameter for the zero part of model.
gamma.count
The tuning parameter of the snet or mnet penalty for the count part of model.
gamma.zero
The tuning parameter of the snet or mnet penalty for the zero part of model.
rescale
logical value, if TRUE, adaptive rescaling
init.theta
The initial value of theta for family="negbin".
theta.fixed
Logical value only used for family="negbin". If TRUE, theta is not updated.
EM
Using EM algorithm. Not implemented otherwise
convtype
convergency type, default is for count component only for speedy computation
maxit.em
Maximum number of EM algorithm
maxit
Maximum number of coordinate descent algorithm
maxit.theta
Maximum number of iterations for estimating theta scaling parameter if family="negbin". Default value maxit.theta may be increased, yet may slow the algorithm
eps.bino
a lower bound of probabilities to be claimed as zero, for computing weights and related values when family="binomial".
reltol
Convergence criteria, default value 1e-5 may be reduced to make more accurate yet slow
shortlist
logical value, if TRUE, limited results return
trace
If TRUE, progress of algorithm is reported
...
Other arguments which can be passed to from glmreg

Value

  • An object of class "zipath", i.e., a list with components including
  • coefficientsa list with elements "count" and "zero" containing the coefficients from the respective models,
  • residualsa vector of raw residuals (observed - fitted),
  • fitted.valuesa vector of fitted means,
  • weightsthe case weights used,
  • termsa list with elements "count", "zero" and "full" containing the terms objects for the respective models,
  • thetaestimate of the additional $\theta$ parameter of the negative binomial model (if a negative binomial regression is used),
  • logliklog-likelihood of the fitted model,
  • familycharacter string describing the count distribution used,
  • linkcharacter string describing the link of the zero-inflation model,
  • linkinvthe inverse link function corresponding to link,
  • convergedlogical value, TRUE indicating successful convergence of zipath, FALSE indicating otherwise
  • callthe original function call
  • formulathe original formula
  • levelslevels of the categorical regressors
  • contrastsa list with elements "count" and "zero" containing the contrasts corresponding to levels from the respective models,
  • modelthe full model frame (if model = TRUE),
  • ythe response count vector (if y = TRUE),
  • xa list with elements "count" and "zero" containing the model matrices from the respective models (if x = TRUE),

Details

The algorithm fits penalized zero-inflated count data regression models using the coordinate descent algorithm within the EM algorithm. The returned fitted model object is of class "zipath" and is similar to fitted "glm" and "zeroinfl" objects. For elements such as "coefficients" a list is returned with elements for the zero and count component, respectively. For details see below. A set of standard extractor functions for fitted model objects is available for objects of class "zipath", including methods to the generic functions print, coef, logLik, residuals, predict. See predict.zipath for more details on all methods.

References

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery, Statistical Methods in Medical Research. 2014 Apr 17. [Epub ahead of print]

Zhu Wang, Shuangge Ma, Ching-Yun Wang, Michael Zappitelli, Prasad Devarajan and Chirag R. Parikh (2014) EM for Regularized Zero Inflated Regression Models with Applications to Postoperative Morbidity after Cardiac Surgery in Children, Statistics in Medicine. 33(29):5192-208.

Zhu Wang, Shuangge Ma and Ching-Yun Wang (2015) Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany, Biometrical Journal. 57(5):867-84.

See Also

glm, glmreg, glmregNB

Examples

Run this code
## data
data("bioChemists", package = "pscl")

## without inflation
## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment")
fm_pois <- glmreg(art ~ ., data = bioChemists, family = "poisson")
coef(fm_pois)
fm_nb <- glmregNB(art ~ ., data = bioChemists)
coef(fm_nb)
## with simple inflation (no regressors for zero component)
fm_zip <- zipath(art ~ . | 1, data = bioChemists, nlambda=10)
summary(fm_zip)
fm_zinb <- zipath(art ~ . | 1, data = bioChemists, family = "negbin", nlambda=10)
summary(fm_zinb)
## inflation with regressors
## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")
fm_zip2 <- zipath(art ~ . | ., data = bioChemists, nlambda=10)
summary(fm_zip2)
fm_zinb2 <- zipath(art ~ . | ., data = bioChemists, family = "negbin", nlambda=10)
summary(fm_zinb2)
### non-penalized regression, compare with zeroinfl
fm_zinb3 <- zipath(art ~ . | ., data = bioChemists, family = "negbin", 
lambda.count=0, lambda.zero=0, reltol=1e-12)
summary(fm_zinb3)
fm_zinb4 <- zerofinfl(art ~ . | ., data = bioChemists, dist = "negbin")
summary(fm_zinb4)

Run the code above in your browser using DataLab