mixed: Obtain p-values for a mixed-model from lmer().

Description

Fits and calculates p-values for all effects in a mixed model fitted with lmer. The default behavior calculates type 3 like p-values using the Kenward-Roger approximation for degrees-of-freedom implemented in KRmodcomp (for LMMs only), but also allows for parametric bootstrap (method = "PB"), or likelihood ratio tests (the latter two for LMMs and GLMMs). print, summary, and anova methods for the returned object of class "mixed" are available (the last two return the same data.frame).

Usage

mixed(formula, data, type = 3, method = c("KR", "PB", "LRT"),
  per.parameter = NULL, args.test = list(), test.intercept = FALSE,
  check.contrasts = TRUE, set.data.arg = TRUE, progress = TRUE,
  cl = NULL, ...)

Arguments

formula

a formula describing the full mixed-model to be fitted. As this formula is passed to lmer, it needs at least one random term.

data

data.frame containing the data. Should have all the variables present in fixed, random, and dv as columns.

type

type of test on which effects are based. Only type 3 tests (3 or "III") are correctly implemented (see Details).

method

character vector indicating which methods for obtaining p-values should be used. "KR" (the default) corresponds to the Kenward-Roger approximation for degrees of freedom (only working with linear mixed models). "PB" calculates p-

per.parameter

character vector specifying for which variable tests should be run for each parameter (instead for the overall effect). Can be useful e.g., for testing ordered factors. Relatively untested so results should be compared with a second run witho

args.test

list of arguments passed to the function calculating the p-values. See Details.

test.intercept

logical. Whether or not the intercept should also be fitted and tested for significance. Default is FALSE. Only relevant if type = 3.

check.contrasts

logical. Should contrasts be checked and (if necessary) changed to "contr.sum"? See Details.

set.data.arg

logical. Should the data argument in the slot call of the merMod object returned from lmer be set to the passed data argument? Otherwise the name will be data. Helpful if fitted objects are

progress

if TRUE, shows progress with a text progress bar and other status messages during fitting.

A vector identifying a cluster; used for distributing the estimation of the different models using several cores. See examples. If ckeck.contrasts, mixed sets the current contrasts (getOption("contrasts")) at the nodes. Note this

...

further arguments (such as weights) passed to lmer.

Value

An object of class "mixed" (i.e., a list) with the following elements:
1. anova.tablea data.frame containing the statistics returned fromKRmodcomp. Thestatcolumn in this data.frame gives the value of the test statistic, an F-value formethod = "KR"and a chi-square value for the other two methods.
2. full.modelthe"lmerMod"object returned from fitting the full mixed model.
3. restricted.modelsa list of"lmerMod"objects from fitting the restricted models (i.e., each model lacks the corresponding effect)
4. testsa list of objects returned by the function for obtaining the p-values.
5. typeThetypeargument used when calling this function.
6. methodThemethodargument used when calling this function.
Three identical methods exist for objects of class "mixed": print, summary, and anova. They all print a nice version of the anova.table element of the returned object (which is also invisibly returned). This methods omit some columns and nicely round the other columns. The following columns are always printed:
1. Effectname of effect
2. p.valueestimated p-value for the effect
For LMMs with method="KR" the following further columns are returned (note: the Kenward-Roger correction does two separate things: (1) it computes an effective number for the denominator df; (2) it scales the statistic by a calculated amount, see also http://stackoverflow.com/a/25612960/289572):
1. Fcomputed F statistic
2. ndfnumerator degrees of freedom (number of parameters used for the effect)
3. ddfdenominator degrees of freedom (effective residual degrees of freedom for testing the effect), computed from the Kenward-Roger correction usingpbkrtest::KRmodcomp
4. F.scalingscaling of F-statistic computing from Kenward-Roger approximation.
For models with method="LRT" the following further columns are returned:
1. df.largedegrees of freedom (i.e., estimated paramaters) for full model (i.e., model containing the corresponding effect)
2. df.smalldegrees of freedom (i.e., estimated paramaters) for restricted model (i.e., model without the corresponding effect)
3. chisq2 times the difference in likelihood (obtained withlogLik) between full and restricted model
4. dfdifference in degrees of freedom between full and restricted model (p-value is based on these df).
For models with method="PB" the following further column is returned:
1. stat2 times the difference in likelihood (obtained withlogLik) between full and restricted model (i.e., a chi-square value).

encoding

UTF-8

Details

For an introduction to mixed-modeling for experimental designs see Barr, Levy, Scheepers, & Tily (2013; I highly recommend reading this paper if you use this function), arguments for using the Kenward-Roger approximation for obtaining p-values are given by Judd, Westfall, and Kenny (2012). Further introductions to mixed-modeling for experimental designs are given by Baayen and colleagues (Baayen, 2008; Baayen, Davidson & Bates, 2008; Baayen & Milin, 2010). Specific recommendations on which random effects structure to specify for confirmatory tests can be found in Barr and colleagues (2013) and Barr (2013). p-values are per default calculated via methods from pbkrtest. When method = "KR" (the default), the Kenward-Roger approximation for degrees-of-freedom is calculated using KRmodcomp, which is only applicable to linear-mixed models. The test statistic in the output is a F-value (F). method = "PB" calculates p-values using parametric bootstrap using PBmodcomp. This can be used for linear and also generalized linear mixed models (GLMM) by specifying a family argument to mixed. Note that you should specify further arguments to PBmodcomp via args.test, especially nsim (the number of simulations to form the reference distribution) or cl (for using multiple cores). For other arguments see PBmodcomp. Note that REML (argument to [g]lmer) will be set to FALSE if method is PB. method = "LRT" calculates p-values via likelihood ratio tests implemented in the anova method for "merMod" objects. This is recommended by Barr et al. (2013; which did not test the other methods implemented here). Using likelihood ratio tests is only recommended for models with many levels for the random effects (> 50), but can be pretty helpful in case the other methods fail (due to memory and/or time limitations). The http://glmm.wikidot.com/faq{lme4 faq} also recommends the other methods over likelihood ratio tests. Type 3 tests are obtained by comparing a model in which only the tested effect is excluded with the full model (containing all effects). This corresponds to the (type 3) Wald tests given by car::Anova for "lmerMod" models. The submodels in which the tested effect is excluded are obtained by manually creating a model matrix which si then fitted in "lme4". This is done to avoid R's "feature" to not allow this behavior. Type 2 tests are truly sequential. They are obtained by comparing a model in which the tested effect and all higher oder effect (e.g., all three-way interactions for testing a two-way interaction) are excluded with a model in which only effects up to the order of the tested effect are present and all higher order effects absent. In other words, there are multiple full models, one for each order of effects. Consequently, the results for lower order effects are identical of whether or not higher order effects are part of the model or not. This latter feature is not consistent with classical ANOVA type 2 tests but a consequence of the sequential tests (and https://stat.ethz.ch/pipermail/r-sig-mixed-models/2012q3/018992.html{I didn't find a better way} of implementing the Type 2 tests). This does not correspond to the (type 2) Wald test reported by car::Anova. If you want type 2 Wald tests instead of truly sequential typde 2 tests, use car::Anova with test = "F". Note that the order in which the effects are entered into the formula does not matter (in contrast to type 1 tests). If check.contrasts = TRUE, contrasts will be set to "contr.sum" for all factors in the formula if default contrasts are not equal to "contr.sum" or attrib(factor, "contrasts") != "contr.sum". Furthermore, the current contrasts (obtained via getOption("contrasts")) will be set at the cluster nodes if cl is not NULL.

References

Baayen, R. H. (2008). Analyzing linguistic data: a practical introduction to statistics using R. Cambridge, UK; New York: Cambridge University Press. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390-412. doi:10.1016/j.jml.2007.12.005 Baayen, R. H., & Milin, P. (2010). Analyzing Reaction Times. International Journal of Psychological Research, 3(2), 12-28. Barr, D. J. (2013). Random effects structure for testing interactions in linear mixed-effects models. Frontiers in Quantitative Psychology and Measurement, 328. doi:10.3389/fpsyg.2013.00328 Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255-278. doi:10.1016/j.jml.2012.11.001 Dalal, D. K., & Zickar, M. J. (2012). Some Common Myths About Centering Predictor Variables in Moderated Multiple Regression and Polynomial Regression. Organizational Research Methods, 15(3), 339-362. doi:10.1177/1094428111430540 Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54-69. doi:10.1037/a0028347 Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: a model-comparisons perspective. Mahwah, N.J.: Lawrence Erlbaum Associates.

Examples

Run this code

### replicate results from Table 15.4 (Maxwell & Delaney, 2004, p. 789)
data(md_15.1)
# random intercept plus slope
(t15.4a <- mixed(iq ~ timecat + (1+time|id),data=md_15.1))

# to also replicate exact parameters use treatment.contrasts and the last level as base level:
contrasts(md_15.1$timecat) <- contr.treatment(4, base = 4)
(t15.4b <- mixed(iq ~ timecat + (1+time|id),data=md_15.1, check.contrasts=FALSE))
summary(t15.4a$full.model)  # gives "wrong" parameters extimates
summary(t15.4b$full.model)  # identical parameters estimates

# for more examples from chapter 15 see ?md_15.1

### replicate results from Table 16.3 (Maxwell & Delaney, 2004, p. 837)
data(md_16.1)

# original results need treatment contrasts:
(mixed1_orig <- mixed(severity ~ sex + (1|id), md_16.1, check.contrasts=FALSE))
summary(mixed1_orig$full.model)

# p-value stays the same with afex default contrasts (contr.sum),
# but estimates and t-values for the fixed effects parameters change.
(mixed1 <- mixed(severity ~ sex + (1|id), md_16.1))
summary(mixed1$full.model)


# data for next examples (Maxwell & Delaney, Table 16.4)
data(md_16.4)
str(md_16.4)

### replicate results from Table 16.6 (Maxwell & Delaney, 2004, p. 845)
# Note that (1|room:cond) is needed because room is nested within cond.
# p-value (almost) holds.
(mixed2 <- mixed(induct ~ cond + (1|room:cond), md_16.4))
# (differences are dut to the use of Kenward-Roger approximation here,
# whereas M&W's p-values are based on uncorrected df.)

# again, to obtain identical parameter and t-values, use treatment contrasts:
summary(mixed2$full.model) # not identical

# prepare new data.frame with contrasts:
md_16.4b <- within(md_16.4, cond <- C(cond, contr.treatment, base = 2))
str(md_16.4b)

# p-value stays identical:
(mixed2_orig <- mixed(induct ~ cond + (1|room:cond), md_16.4b, check.contrasts=FALSE))
summary(mixed2_orig$full.model) # replicates parameters


### replicate results from Table 16.7 (Maxwell & Delaney, 2004, p. 851)
# F-values (almost) hold, p-values (especially for skill) are off
(mixed3 <- mixed(induct ~ cond + skill + (1|room:cond), md_16.4))

# however, parameters are perfectly recovered when using the original contrasts:
mixed3_orig <- mixed(induct ~ cond + skill + (1|room:cond), md_16.4b, check.contrasts=FALSE)
summary(mixed3_orig$full.model)



### replicate results from Table 16.10 (Maxwell & Delaney, 2004, p. 862)
# for this we need to center cog:
md_16.4b$cog <- scale(md_16.4b$cog, scale=FALSE)

# F-values and p-values are relatively off:
(mixed4 <- mixed(induct ~ cond*cog + (cog|room:cond), md_16.4b))
# contrast has a relatively important influence on cog
(mixed4_orig <- mixed(induct ~ cond*cog + (cog|room:cond), md_16.4b, check.contrasts=FALSE))

# parameters are again almost perfectly recovered:
summary(mixed4_orig$full.model)



# use the obk.long data (not reasonable, no random slopes)
data(obk.long)
mixed(value ~ treatment * phase + (1|id), obk.long)

# Examples for using the per.parammeter argument:
data(obk.long, package = "afex")
obk.long$hour <- ordered(obk.long$hour)

# tests only the main effect parameters of hour individually per parameter.
mixed(value ~ treatment*phase*hour +(1|id), per.parameter = "^hour$", data = obk.long)

# tests all parameters including hour individually
mixed(value ~ treatment*phase*hour +(1|id), per.parameter = "hour", data = obk.long)

# tests all parameters individually
mixed(value ~ treatment*phase*hour +(1|id), per.parameter = ".", data = obk.long)

# example data from package languageR:
# Lexical decision latencies elicited from 21 subjects for 79 English concrete nouns, 
# with variables linked to subject or word. 
data(lexdec, package = "languageR")

# using the simplest model
m1 <- mixed(RT ~ Correct + Trial + PrevType * meanWeight + 
    Frequency + NativeLanguage * Length + (1|Subject) + (1|Word), data = lexdec)

m1
##                  Effect  stat ndf     ddf F.scaling p.value
## 1               Correct  8.15   1 1627.73      1.00    .004
## 2                 Trial  7.57   1 1592.43      1.00    .006
## 3              PrevType  0.17   1 1605.39      1.00     .68
## 4            meanWeight 14.85   1   75.39      1.00   .0002
## 5             Frequency 56.53   1   76.08      1.00  <.0001
## 6        NativeLanguage  0.70   1   27.11      1.00     .41
## 7                Length  8.70   1   75.83      1.00    .004
## 8   PrevType:meanWeight  6.18   1 1601.18      1.00     .01
## 9 NativeLanguage:Length 14.24   1 1555.49      1.00   .0002

# Fitting a GLMM using parametric bootstrap:
require("mlmRev") # for the data, see ?Contraception

gm1 <- mixed(use ~ age + I(age^2) + urban + livch + (1 | district), method = "PB",
 family = binomial, data = Contraception, args.test = list(nsim = 10))

#######################
### using multicore ###
#######################

require(parallel)
(nc <- detectCores()) # number of cores
cl <- makeCluster(rep("localhost", nc)) # make cluster
# to keep track of what the function is doindg redirect output to outfile:
# cl <- makeCluster(rep("localhost", nc), outfile = "cl.log.txt")

## There are two ways to use multicore:

# 1. Obtain fits with multicore:
mixed(value ~ treatment*phase*hour +(1|id), data = obk.long, method = "LRT", cl = cl)

# 2. Obtain PB samples via multicore: 
mixed(use ~ age + I(age^2) + urban + livch + (1 | district), family = binomial,
 method = "PB", data = Contraception, args.test = list(nsim = 10, cl = cl))

## Both ways can be combined:
mixed(use ~ age + I(age^2) + urban + livch + (1 | district), family = binomial, 
 method = "PB", data = Contraception, args.test = list(nsim = 10, cl = cl), cl = cl)

stopCluster(cl)

Run the code above in your browser using DataLab