pb-modcomp: Model comparison using parametric bootstrap methods.

Description

Model comparison of nested models using parametric bootstrap methods. Implemented for some commonly applied model types.

Usage

seqPBmodcomp(largeModel, smallModel, h = 20, nsim = 1000, cl = 1)
PBmodcomp(
  largeModel,
  smallModel,
  nsim = 1000,
  ref = NULL,
  seed = NULL,
  cl = NULL,
  details = 0
)
# S3 method for merMod
PBmodcomp(
  largeModel,
  smallModel,
  nsim = 1000,
  ref = NULL,
  seed = NULL,
  cl = NULL,
  details = 0
)
# S3 method for lm
PBmodcomp(
  largeModel,
  smallModel,
  nsim = 1000,
  ref = NULL,
  seed = NULL,
  cl = NULL,
  details = 0
)

Arguments

largeModel

A model object. Can be a linear mixed effects model or generalized linear mixed effects model (as fitted with lmer() and glmer() function in the lme4 package) or a linear normal model or a generalized linear model. The largeModel must be larger than smallModel (see below).

smallModel

A model of the same type as largeModel or a restriction matrix.

For sequential computing for bootstrap p-values: The number of extreme cases needed to generate before the sampling proces stops.

nsim

The number of simulations to form the reference distribution.

A vector identifying a cluster; used for calculating the reference distribution using several cores. See examples below.

ref

Vector containing samples from the reference distribution. If NULL, this vector will be generated using PBrefdist().

seed

A seed that will be passed to the simulation of new datasets.

details

The amount of output produced. Mainly relevant for debugging purposes.

Details

The model object must be fitted with maximum likelihood (i.e. with REML=FALSE). If the object is fitted with restricted maximum likelihood (i.e. with REML=TRUE) then the model is refitted with REML=FALSE before the p-values are calculated. Put differently, the user needs not worry about this issue.

Under the fitted hypothesis (i.e. under the fitted small model) nsim samples of the likelihood ratio test statistic (LRT) are generetated.

Then p-values are calculated as follows:

LRT: Assuming that LRT has a chi-square distribution.

PBtest: The fraction of simulated LRT-values that are larger or equal to the observed LRT value.

Bartlett: A Bartlett correction is of LRT is calculated from the mean of the simulated LRT-values

Gamma: The reference distribution of LRT is assumed to be a gamma distribution with mean and variance determined as the sample mean and sample variance of the simulated LRT-values.

F: The LRT divided by the number of degrees of freedom is assumed to be F-distributed, where the denominator degrees of freedom are determined by matching the first moment of the reference distribution.

References

Ulrich Halekoh, S<U+00F8>ren H<U+00F8>jsgaard (2014)., A Kenward-Roger Approximation and Parametric Bootstrap Methods for Tests in Linear Mixed Models - The R Package pbkrtest., Journal of Statistical Software, 58(10), 1-30., http://www.jstatsoft.org/v59/i09/

Examples

Run this code

# NOT RUN {
data(beets, package="pbkrtest")
head(beets)

## Linear mixed effects model:
sug   <- lmer(sugpct ~ block + sow + harvest + (1|block:harvest),
              data=beets, REML=FALSE)
sug.h <- update(sug, .~. -harvest)
sug.s <- update(sug, .~. -sow)

anova(sug, sug.h)
PBmodcomp(sug, sug.h, nsim=50, cl=1)
anova(sug, sug.h)
PBmodcomp(sug, sug.s, nsim=50, cl=1)

## Linear normal model:
sug <- lm(sugpct ~ block + sow + harvest, data=beets)
sug.h <- update(sug, .~. -harvest)
sug.s <- update(sug, .~. -sow)

anova(sug, sug.h)
PBmodcomp(sug, sug.h, nsim=50, cl=1)
anova(sug, sug.s)
PBmodcomp(sug, sug.s, nsim=50, cl=1)

## Generalized linear model
counts    <- c(18,17,15,20,10,20,25,13,12)
outcome   <- gl(3,1,9)
treatment <- gl(3,3)
d.AD      <- data.frame(treatment, outcome, counts)
head(d.AD)
glm.D93   <- glm(counts ~ outcome + treatment, family = poisson())
glm.D93.o <- update(glm.D93, .~. -outcome)
glm.D93.t <- update(glm.D93, .~. -treatment)

anova(glm.D93, glm.D93.o, test="Chisq")
PBmodcomp(glm.D93, glm.D93.o, nsim=50, cl=1)
anova(glm.D93, glm.D93.t, test="Chisq")
PBmodcomp(glm.D93, glm.D93.t, nsim=50, cl=1)

## Generalized linear mixed model (it takes a while to fit these)
# }
# NOT RUN {
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
(gm2 <- update(gm1, .~.-period))
anova(gm1, gm2)
PBmodcomp(gm1, gm2, cl=2)
# }
# NOT RUN {

# }
# NOT RUN {
(fmLarge <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
## removing Days
(fmSmall <- lmer(Reaction ~ 1 + (Days|Subject), sleepstudy))
anova(fmLarge, fmSmall)
PBmodcomp(fmLarge, fmSmall, cl=1)

## The same test using a restriction matrix
L <- cbind(0,1)
PBmodcomp(fmLarge, L, cl=1)

## Vanilla
PBmodcomp(beet0, beet_no.harv, nsim=1000, cl=1)

## Simulate reference distribution separately:
refdist <- PBrefdist(beet0, beet_no.harv, nsim=1000)
PBmodcomp(beet0, beet_no.harv, ref=refdist, cl=1)

## Do computations with multiple processors:
## Number of cores:
(nc <- detectCores())
## Create clusters
cl <- makeCluster(rep("localhost", nc))

## Then do:
PBmodcomp(beet0, beet_no.harv, cl=cl)

## Or in two steps:
refdist <- PBrefdist(beet0, beet_no.harv, nsim=1000, cl=cl)
PBmodcomp(beet0, beet_no.harv, ref=refdist)

## It is recommended to stop the clusters before quitting R:
stopCluster(cl)
# }
# NOT RUN {

# }