family
Family Objects for Models
Family objects provide a convenient way to specify the details of the
models used by functions such as glm
. See the
documentation for glm
for the details on how such model
fitting takes place.
- Keywords
- models
Usage
family(object, …)binomial(link = "logit")
gaussian(link = "identity")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")
Arguments
- link
a specification for the model link function. This can be a name/expression, a literal character string, a length-one character vector, or an object of class
"link-glm"
(such as generated bymake.link
) provided it is not specified via one of the standard names given next.The
gaussian
family accepts the links (as names)identity
,log
andinverse
; thebinomial
family the linkslogit
,probit
,cauchit
, (corresponding to logistic, normal and Cauchy CDFs respectively)log
andcloglog
(complementary log-log); theGamma
family the linksinverse
,identity
andlog
; thepoisson
family the linkslog
,identity
, andsqrt
; and theinverse.gaussian
family the links1/mu^2
,inverse
,identity
andlog
.The
quasi
family accepts the linkslogit
,probit
,cloglog
,identity
,inverse
,log
,1/mu^2
andsqrt
, and the functionpower
can be used to create a power link function.- variance
for all families other than
quasi
, the variance function is determined by the family. Thequasi
family will accept the literal character string (or unquoted as a name/expression) specifications"constant"
,"mu(1-mu)"
,"mu"
,"mu^2"
and"mu^3"
, a length-one character vector taking one of those values, or a list containing componentsvarfun
,validmu
,dev.resids
,initialize
andname
.- object
the function
family
accesses thefamily
objects which are stored within objects created by modelling functions (e.g.,glm
).- …
further arguments passed to methods.
Details
family
is a generic function with methods for classes
"glm"
and "lm"
(the latter returning gaussian()
).
For the binomial
and quasibinomial
families the response
can be specified in one of three ways:
As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
As a numerical vector with values between
0
and1
, interpreted as the proportion of successful cases (with the total number of cases given by theweights
).As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
The quasibinomial
and quasipoisson
families differ from
the binomial
and poisson
families only in that the
dispersion parameter is not fixed at one, so they can model
over-dispersion. For the binomial case see McCullagh and Nelder
(1989, pp.124--8). Although they show that there is (under some
restrictions) a model with
variance proportional to mean as in the quasi-binomial model, note
that glm
does not compute maximum-likelihood estimates in that
model. The behaviour of S is closer to the quasi- variants.
Value
An object of class "family"
(which has a concise print method).
This is a list with elements
character: the family name.
character: the link name.
function: the link.
function: the inverse of the link function.
function: the variance as a function of the mean.
function giving the deviance for each observation
as a function of (y, mu, wt)
, used by the
residuals
method when computing
deviance residuals.
function giving the AIC value if appropriate (but NA
for the quasi- families). More precisely, this function
returns \(-2\ell + 2 s\), where \(\ell\) is the
log-likelihood and \(s\) is the number of estimated scale
parameters. Note that the penalty term for the location parameters
(typically the “regression coefficients”) is added elsewhere,
e.g., in glm.fit()
, or AIC()
, see the
AIC example in glm
.
See logLik
for the assumptions made about the
dispersion parameter.
function: derivative of the inverse-link function with respect to the linear predictor. If the inverse-link function is \(\mu = g^{-1}(\eta)\) where \(\eta\) is the value of the linear predictor, then this function returns \(d(g^{-1})/d\eta = d\mu/d\eta\).
expression. This needs to set up whatever data
objects are needed for the family as well as n
(needed for
AIC in the binomial family) and mustart
(see glm
).
logical function. Returns TRUE
if a mean
vector mu
is within the domain of variance
.
logical function. Returns TRUE
if a linear
predictor eta
is within the domain of linkinv
.
(optional) function simulate(object, nsim)
to be
called by the "lm"
method of simulate
. It will
normally return a matrix with nsim
columns and one row for
each fitted value, but it can also return a list of length
nsim
. Clearly this will be missing for ‘quasi-’ families.
Note
The link
and variance
arguments have rather awkward
semantics for back-compatibility. The recommended way is to supply
them as quoted character strings, but they can also be supplied
unquoted (as names or expressions). Additionally, they can be
supplied as a length-one character vector giving the name of one of
the options, or as a list (for link
, of class
"link-glm"
). The restrictions apply only to links given as
names: when given as a character string all the links known to
make.link
are accepted.
This is potentially ambiguous: supplying link = logit
could mean
the unquoted name of a link or the value of object logit
. It
is interpreted if possible as the name of an allowed link, then
as an object. (You can force the interpretation to always be the value of
an object via logit[1]
.)
References
McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.
Dobson, A. J. (1983) An Introduction to Statistical Modelling. London: Chapman and Hall.
Cox, D. R. and Snell, E. J. (1981). Applied Statistics; Principles and Examples. London: Chapman and Hall.
Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
See Also
For binomial coefficients, choose
;
the binomial and negative binomial distributions,
Binomial
, and NegBinomial
.
Examples
library(stats)
# NOT RUN {
require(utils) # for str
nf <- gaussian() # Normal family
nf
str(nf)
gf <- Gamma()
gf
str(gf)
gf$linkinv
gf$variance(-3:4) #- == (.)^2
## Binomial with default 'logit' link: Check some properties visually:
bi <- binomial()
et <- seq(-10,10, by=1/8)
plot(et, bi$mu.eta(et), type="l")
## show that mu.eta() is derivative of linkinv() :
lines((et[-1]+et[-length(et)])/2, col=adjustcolor("red", 1/4),
diff(bi$linkinv(et))/diff(et), type="l", lwd=4)
## which here is the logistic density:
lines(et, dlogis(et), lwd=3, col=adjustcolor("blue", 1/4))
stopifnot(exprs = {
all.equal(bi$ mu.eta(et), dlogis(et))
all.equal(bi$linkinv(et), plogis(et) -> m)
all.equal(bi$linkfun(m ), qlogis(m)) # logit(.) == qlogis(.) !
})
## Data from example(glm) :
d.AD <- data.frame(treatment = gl(3,3),
outcome = gl(3,1,9),
counts = c(18,17,15, 20,10,20, 25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, d.AD, family = poisson())
## Quasipoisson: compare with above / example(glm) :
glm.qD93 <- glm(counts ~ outcome + treatment, d.AD, family = quasipoisson())
# }
# NOT RUN {
glm.qD93
anova (glm.qD93, test = "F")
summary(glm.qD93)
## for Poisson results (same as from 'glm.D93' !) use
anova (glm.qD93, dispersion = 1, test = "Chisq")
summary(glm.qD93, dispersion = 1)
# }
# NOT RUN {
## Example of user-specified link, a logit model for p^days
## See Shaffer, T. 2004. Auk 121(2): 526-540.
logexp <- function(days = 1)
{
linkfun <- function(mu) qlogis(mu^(1/days))
linkinv <- function(eta) plogis(eta)^days
mu.eta <- function(eta) days * plogis(eta)^(days-1) *
binomial()$mu.eta(eta)
valideta <- function(eta) TRUE
link <- paste0("logexp(", days, ")")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta, name = link),
class = "link-glm")
}
(bil3 <- binomial(logexp(3)))
# }
# NOT RUN {
## in practice this would be used with a vector of 'days', in
## which case use an offset of 0 in the corresponding formula
## to get the null deviance right.
## Binomial with identity link: often not a good idea, as both
## computationally and conceptually difficult:
binomial(link = "identity") ## is exactly the same as
binomial(link = make.link("identity"))
## tests of quasi
x <- rnorm(100)
y <- rpois(100, exp(1+x))
glm(y ~ x, family = quasi(variance = "mu", link = "log"))
# which is the same as
glm(y ~ x, family = poisson)
glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))
# }
# NOT RUN {
glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails
# }
# NOT RUN {
y <- rbinom(100, 1, plogis(x))
# need to set a starting value for the next fit
glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), start = c(0,1))
# }