Fits a generalized linear model with a link-linear model for the dispersion as well as for the mean.
dglm(formula=formula(data), dformula = ~ 1, family = gaussian, dlink = "log",
data = parent.frame(), subset = NULL, weights = NULL, contrasts = NULL,
method = "ml", mustart = NULL, betastart = NULL, etastart = NULL, phistart = NULL,
control = dglm.control(...), ykeep = TRUE, xkeep = FALSE, zkeep = FALSE, ...)dglm.constant(y, family, weights = 1)
an object of class dglm
is returned,
which inherits from glm
and lm
.
See dglm-class
for details.
a symbolic description of the model to be fit.
The details of model specification are found in dglm
.
a formula expression of the form
~ predictor
, the response being ignored.
This specifies the linear predictor for modelling the dispersion.
A term of the form offset(expression)
is allowed.
a description of the error distribution and link function to
be used in the model.
See glm
for more information.
link function for modelling the dispersion.
Any link function accepted by the quasi
family is allowed, including power(x)
. See details below.
an optional data frame containing the variables in the model.
See glm
for more information.
an optional vector specifying a subset of observations to be used in the fitting process.
an optional vector of weights to be used in the fitting process.
an optional list. See the contrasts.arg
of
model.matrix.default
.
the method used to estimate the dispersion parameters;
the default is "reml"
for restricted maximum likelihood
and the alternative is "ml"
for maximum likelihood.
Upper case and partial matches are allowed.
numeric vector giving starting values for the fitted values
or expected responses.
Must be of the same length as the response,
or of length 1 if a constant starting vector is desired.
Ignored if betastart
is supplied.
numeric vector giving starting values for the regression coefficients in the link-linear model for the mean.
numeric vector giving starting values for the linear predictor for the mean model.
numeric vector giving starting values for the dispersion parameters.
a list of iteration and algorithmic constants.
See dglm.control
for their names and default values.
These can also be set as arguments to dglm
itself.
logical flag: if TRUE
, the vector of responses is returned.
logical flag: if TRUE
, the model.matrix
for the mean
model is returned.
logical flag: if TRUE
, the model.matrix
for the
dispersion model is returned.
further arguments passed to or from other methods.
numeric response vector
Gordon Smyth, ported to R by Peter Dunn
Write \(\mu_i = \mbox{E}[y_i]\) for the expectation of the
\(i\)th response.
Then \(\mbox{Var}[Y_i] = \phi_i V(\mu_i)\) where \(V\)
is the variance function and \(\phi_i\) is the dispersion of the
\(i\)th response
(often denoted as the Greek character `phi').
We assume the link linear models
\(g(\mu_i) = \mathbf{x}_i^T \mathbf{b}\) and
\(h(\phi_i) = \mathbf{z}_i^T \mathbf{z}\),
where \(\mathbf{x}_i\) and \(\mathbf{z}_i\) are vectors of covariates,
and \(\mathbf{b}\) and \(\mathbf{a}\) are vectors of regression
cofficients affecting the mean and dispersion respectively.
The argument dlink
specifies \(h\).
See family
for how to specify \(g\).
The optional arguments mustart
, betastart
and phistart
specify starting values for \(\mu_i\), \(\mathbf{b}\)
and \(\phi_i\) respectively.
The parameters \(\mathbf{b}\) are estimated as for an ordinary glm. The parameters \(\mathbf{a}\) are estimated by way of a dual glm in which the deviance components of the ordinary glm appear as responses. The estimation procedure alternates between one iteration for the mean submodel and one iteration for the dispersion submodel until overall convergence.
The output from dglm
, out
say, consists of two glm
objects
(that for the dispersion submodel is out$dispersion.fit
) with a few more
components for the outer iteration and overall likelihood.
The summary
and anova
functions have special methods for dglm
objects.
Any generic function that has methods for glm
s or lm
s will work on
out
, giving information about the mean submodel.
Information about the dispersion submodel can be obtained by using
out$dispersion.fit
as argument rather than out itself.
In particular drop1(out,scale=1)
gives correct score statistics for
removing terms from the mean submodel,
while drop1(out$dispersion.fit,scale=2)
gives correct score
statistics for removing terms from the dispersion submodel.
The dispersion submodel is treated as a gamma family unless the original
reponses are gamma, in which case the dispersion submodel is digamma.
This is exact if the original glm family is gaussian
,
Gamma
or inverse.gaussian
. In other cases it can be
justified by the saddle-point approximation to the density of the responses.
The results will therefore be close to exact ML or REML when the dispersions
are small compared to the means. In all cases the dispersion submodel has prior
weights 1, and has its own dispersion parameter which is 2.
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47--60. tools:::Rd_expr_doi("10.1111/j.2517-6161.1989.tb01747.x")
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696-709. tools:::Rd_expr_doi("10.1002/(SICI)1099-095X(199911/12)10:6<695::aid-env385>3.0.CO;2-M") https://gksmyth.github.io/pubs/Ties98-Preprint.pdf695::aid-env385>
Smyth, G. K., and Verbyla, A. P. (1999). Double generalized linear models: approximate REML and diagnostics. In Statistical Modelling: Proceedings of the 14th International Workshop on Statistical Modelling, Graz, Austria, July 19-23, 1999, H. Friedl, A. Berghold, G. Kauermann (eds.), Technical University, Graz, Austria, pages 66-80. https://gksmyth.github.io/pubs/iwsm99-Preprint.pdf
dglm-class
, dglm.control
,
Digamma family
, Polygamma
.
See https://gksmyth.github.io/s/dglm.html for the original S-Plus code.
# Continuing the example from glm, but this time try
# fitting a Gamma double generalized linear model also.
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
# The same example as in glm: the dispersion is modelled as constant
# However, dglm uses ml not reml, so the results are slightly different:
out <- dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma)
summary(out)
# Try a double glm
out2 <- dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma)
summary(out2)
anova(out2)
# Summarize the mean model as for a glm
summary.glm(out2)
# Summarize the dispersion model as for a glm
summary(out2$dispersion.fit)
# Examine goodness of fit of dispersion model by plotting residuals
plot(fitted(out2$dispersion.fit),residuals(out2$dispersion.fit))
Run the code above in your browser using DataLab