hnp: Half-Normal Plots with Simulation Envelopes

Description

Produces a (half-)normal plot from a fitted model object for a range of different models. Extendable to non-implemented model classes.

Usage

hnp(object, sim = 99, conf = 0.95, resid.type, maxit,
           halfnormal = T, scale = F, plot.sim = T, verb.sim = F,
           warn = F, how.many.out = F, print.on = F, paint.out = F,
           col.paint.out, newclass = F, diagfun, simfun, fitfun,
           data, ...)

Arguments

object

fitted model object.

sim

number of simulations. Default is 99.

conf

confidence level of the simulated envelope. Default is 0.95.

resid.type

type of residuals to be used ("deviance", "pearson", "response", "working", "simple"). Types "student" or "standard" may be used for aov, lm,

maxit

maximum number of iterations of the estimation algorithm. Defaults are 25 for glm, glm.nb, gamlss and vglm objects, 300 for glmmad

halfnormal

logical. If TRUE, a half-normal plot is produced. If FALSE, a normal plot is produced. Default is TRUE.

scale

logical. If TRUE, when using a numeric vector as object, uses a non-standard normal to simulate from, estimating mean and variance from the numeric vector. If FALSE, uses a standard normal distribution to simulate fr

plot.sim

logical. Should the (half-)normal plot be plotted? Default is TRUE.

verb.sim

logical. If TRUE, prints each step of the simulation procedure. Default is FALSE.

warn

logical. If TRUE, shows possible warning messages hidden in the simulation process. Default is FALSE.

how.many.out

logical. If TRUE, the number of points out of the envelope is printed. Default is FALSE.

print.on

logical. If TRUE, the number of points out of the envelope is printed on the plot. Default is FALSE.

paint.out

logical. If TRUE, points out of the simulation envelope are plotted in a different color. Default is FALSE.

col.paint.out

If paint.out=TRUE, sets the color of points out of the envelope. Default is "red".

newclass

logical. Must be TRUE so that user-defined functions may be provided to do the half-normal plot with a simulation envelope for a model class or a diagnostic measure which is not currently implemented in hnp. Default is FALS

diagfun

user-defined function used to obtain the desired diagnostic measures from the fitted model object (only needed when newclass=TRUE). Default is resid.

simfun

user-defined function used to simulate a random sample from the model estimated parameters (only needed when newclass=TRUE).

fitfun

only needed when newclass=TRUE. User-defined function used to re-fit the desired model to the simulated data.

data

data set of "data.frame" class used to fit the model (only needed for when newclass=TRUE).

...

extra graphical arguments passed to plot.hnp.

Value

hnp returns an object of class "hnp", which is a list containing the following components:
xquantiles of the (half-)normal distribution
lowerlower envelope band
medianmedian envelope band
upperupper envelope band
residualsdiagnostic measures in absolute value and in order
out.indexvector indicating which points are out of the envelope
col.paint.outcolor of points which are outside of the envelope (used if paint.out=TRUE)
how.many.outlogical. Equals TRUE if how.many.out=TRUE in the hnp call
totallength of the diagnostic measure vector
outnumber of points out of the envelope
print.onlogical. Equals TRUE if print.on=TRUE in the hnp call
paint.outlogical. Equals TRUE if paint.out=TRUE in the hnp call

encoding

UTF-8

Details

A relatively easy way to assess goodness-of-fit of a fitted model is to use (half-)normal plots of a model diagnostic, e.g., different types of residuals, Cook's distance, leverage. These plots are obtained by plotting the ordered absolute values of a model diagnostic versus the expected order statistic of a half-normal distribution, $$\Phi^{-1}(\frac{i+n-1/8}{2*n+1/2})$$ (for a half-normal plot) or the normal distribution, $$\Phi^{-1}(\frac{i+3/8}{n+1/4})$$ (for a normal plot). Atkinson (1985) proposed the addition of a simulated envelope, which is such that under the correct model the plot for the observed data is likely to fall within the envelope. The objective is not to provide a region of acceptance, but some sort of guidance to what kind of shape to expect. Obtaining the simulated envelope is simple and consists of (1) fitting a model; (2) extracting model diagnostics and calculating sorted absolute values; (3) simulating 99 (or more) response variables using the same model matrix, error distribution and fitted parameters; (4) fitting the same model to each simulated response variable and obtaining the same model diagnostics, again sorted absolute values; (5) computing the desired percentiles (e.g., 2.5 and 97.5) at each value of the expected order statistic to form the envelope. This function handles different model classes and more will be implemented as time goes by. So far, the following models are included: ll{ Continuous data: Normal: functions lm, aov and glm with family=gaussian Gamma: function glm with family=Gamma Inverse gaussian: function glm with family=inverse.gaussian Proportion data: Binomial: function glm with family=binomial Quasi-binomial: function glm with family=quasibinomial Beta-binomial: package VGAM - function vglm, with family=betabinomial; package aods3 - function aodml, with family="bb"; package gamlss - function gamlss, with family=BB; package glmmADMB - function glmmadmb, with family="betabinomial" Zero-inflated binomial: package VGAM - function vglm, with family=zibinomial; package gamlss - function gamlss, with family=ZIBI; package glmmADMB - function glmmadmb, with family="binomial" and zeroInfl=TRUE Zero-inflated beta-binomial: package gamlss - function gamlss, with family=ZIBB; package glmmADMB - function glmmadmb, with family="betabinomial" and zeroInfl=TRUE Multinomial: package nnet - function multinom Count data: Poisson: function glm with family=poisson Quasi-Poisson: function glm with family=quasipoisson Negative binomial: package MASS - function glm.nb; package aods3 - function aodml, with family="nb" and phi.scale="inverse" Zero-inflated Poisson: package pscl - function zeroinfl, with dist="poisson" Zero-inflated negative binomial: package pscl - function zeroinfl, with dist="negbin" Hurdle Poisson: package pscl - function hurdle, with dist="poisson" Hurdle negative binomial: package pscl - function hurdle, with dist="negbin" Mixed models: Linear mixed models: package lme4, function lmer Generalized linear mixed models: package lme4, function glmer with family=poisson or binomial } Users can also use a numeric vector as object and hnp will generate the (half-)normal plot with a simulated envelope using the standard normal distribution (scale=F) or $N(\mu, \sigma^2)$ (scale=T). Implementing a new model class is done by providing three functions to hnp: diagfun - to obtain model diagnostics, simfun - to simulate random variables and fitfun - to refit the model to simulated variables. The way these functions must be written is shown in the Examples section.

References

Atkinson, A. C. (1985) Plots, transformations and regression, Clarendon Press, Oxford. Demétrio{Demetrio}, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26. Hinde, J. and Demétrio{Demetrio}, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170. Demétrio{Demetrio}, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.

Examples

Run this code

## Simple Poisson regression
set.seed(100)
counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8))
treatment <- gl(4, 5)
fit <- glm(counts ~ treatment, family=poisson)
anova(fit, test="Chisq")

## half-normal plot
hnp(fit)

## or save it in an object and then use the plot method
my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE)
plot(my.hnp)

## changing graphical parameters
plot(my.hnp, lty=2, pch=4, cex=1.2)
plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1))
plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores",
     ylab="Deviance residuals", legpos="bottomright")

## Using a numeric vector
my.vec <- rnorm(20, 4, 4)
hnp(my.vec) # using N(0,1)
hnp(my.vec, scale=TRUE) # using N(mu, sigma^2)

## Implementing new classes
## Users provide three functions - diagfun, simfun and fitfun,
## in the following way:
##
## diagfun <- function(obj) {
##   userfunction(obj, other_argumens)
##     # e.g., resid(obj, type="pearson")
##   }
##
## simfun <- function(n, obj) {
##   userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj))
##   }
##
## when response is categorical (n>=3), "userfunction"
## (e.g., rmultinom) must not depend on n and n must be
## equal to 1 (unless "userfunction" works differently
## than rmultinom in a sense that it produces less than
## an entire simulated data set when n=1)
##
## fitfun <- function(data) {
##  userfunction(y. ~ linear_predictor, other_arguments, data=data)
##    # e.g., glm(y. ~ block + factor1 * factor2, family=poisson,
##    #           data=data)
##  }
##
## the explanatory variables used in the linear predictor must
## be present in the data.frame passed to the "data" argument
## in hnp.
##
## when response is binary:
## fitfun <- function(data) {
##  userfunction(cbind(y., m-y.) ~ linear_predictor,
##               other_arguments, data=data)
##    #e.g., glm(cbind(y., m-y.) ~ treatment - 1,
##    #          family=binomial, data=data)
##  }

## Example no. 1: Using Cook's distance as a diagnostic measure
y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10))
tr <- gl(3, 10)
fit1 <- glm(y ~ tr, family=poisson)

# diagfun
d.fun <- function(obj) cooks.distance(obj)

# simfun
s.fun <- function(n, obj) {
  lam <- fitted(obj)
  rpois(n, lambda=lam)
}

# fitfun
f.fun <- function(data) glm(y. ~ tr, family=poisson, data=data)

# hnp call
hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
    fitfun=f.fun, data=data.frame(y, tr))

## Example no. 2: Implementing gamma model using package gamlss
# load package
require(gamlss)

# model fitting
y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5)
tr <- gl(3, 10)
fit2 <- gamlss(y ~ tr, family=GA)

# diagfun
d.fun <- function(obj) resid(obj) # this is the default if no
                                  # diagfun is provided

# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  sig <- obj$sigma.fv
  rGA(n, mu=mu, sigma=sig)
}

# fitfun
f.fun <- function(data) gamlss(y. ~ tr, family=GA, data=data)

# hnp call
hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
    fitfun=f.fun, data=data.frame(y, tr))

## Example no. 3: Implementing binomial model in gamlss
# model fitting
y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10))
m <- 50
tr <- gl(3, 10)
fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI)

# diagfun
d.fun <- function(obj) resid(obj)

# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  bd <- obj$bd
  rBI(n, bd=bd, mu=mu)
}

# fitfun
f.fun <- function(data) gamlss(cbind(y., m-y.) ~ tr,
                               family=BI, data=data)

# hnp call
hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
    fitfun=f.fun, data=data.frame(y, tr, m))

Run the code above in your browser using DataLab