pdredge: Automated model selection using parallel computation

Description

Parallelized version of dredge.

Usage

pdredge(global.model, cluster = NA, beta = FALSE, evaluate = TRUE, rank = "AICc", 
    fixed = NULL, m.max = NA, m.min = 0, subset, marg.ex = NULL, trace = FALSE, 
    varying, extra, ct.args = NULL, check = FALSE, ...)

Arguments

global.model, beta, evaluate, rank

see dredge.

fixed, m.max, m.min, subset, marg.ex, varying, extra, ct.args, ...

see dredge.

trace

displays the generated calls, but may not work as expected since the models are evaluated in batches rather than one by one.

cluster

either a valid cluster object, or NA for a single threaded execution.

check

logical, whether to evaluate the global.model in the cluster and compare with the original one using all.equal.

Value

See dredge.

encoding

utf-8

Details

All the dependencies for fitting the global.model, including the data and any objects the modelling function will use must be exported into the cluster worker nodes (e.g. via clusterExport). The required packages must be also loaded thereinto (e.g. via clusterEvalQ(..., library(package)), before the cluster is used by pdredge.

pdredge tries to check whether all the variables and functions used in the call to global.model are present in the cluster nodes' .GlobalEnv before proceeding further, and also, if check is TRUE, it will compare the global.model updated at the cluster nodes with the one given as argument. This additional test will, however, slow down the execution.

Use of pdredge should be considered mainly with large datasets and complex models, for which dredge takes a long time to complete. Otherwise there may be no perceptible improvement or even the parallel version may perform worse than a single-threaded one.

Examples

Run this code

# Normally this should be simply "require(parallel) || require(snow)",
# but here we resort to an (ugly) trick to avoid MuMIn's dependency on one of
# these packages and still pass R-check:
if(MuMIn:::.parallelPkgCheck(quiet = TRUE)) {

# One of these packages is required:
require(parallel) || require(snow)

# From example(Beetle)
data(Beetle)

Beetle100 <- Beetle[sample(nrow(Beetle), 100, replace = TRUE),]

fm1 <- glm(Prop ~ dose + I(dose^2) + log(dose) + I(log(dose)^2),
    data = Beetle100, family = binomial)

msubset <- expression(xor(dose, `log(dose)`) & (dose | !`I(dose^2)`)
    & (`log(dose)` | !`I(log(dose)^2)`))
varying.link <- list(family = alist(logit = binomial("logit"),
    probit = binomial("probit"), cloglog = binomial("cloglog") ))

# Set up the cluster
clusterType <- if(length(.find.package("snow", quiet = TRUE))) "SOCK" else "PSOCK"
clust <- try(makeCluster(getOption("cl.cores", 2), type = clusterType))
if(inherits(clust, "cluster")) {
clusterExport(clust, "Beetle100")

# noticeable gain only when data has about 3000 rows (Windows 2-core machine)
print(system.time(dredge(fm1, subset = msubset, varying = varying.link)))
print(system.time(pdredge(fm1, cluster = FALSE, subset = msubset,
    varying = varying.link)))
print(system.time(pdd <- pdredge(fm1, cluster = clust, subset = msubset,
    varying = varying.link)))

print(pdd)

# Time consuming example with 'unmarked' model, based on example(pcount).
# Having enough patience you can run this with 'demo(pdredge.pcount)'.
library(unmarked)
data(mallard)
mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site,
    obsCovs = mallard.obs)
(ufm.mallard <- pcount(~ ivel + date + I(date^2) ~ length + elev + forest,
    mallardUMF, K = 30))
clusterEvalQ(clust, library(unmarked))
clusterExport(clust, "mallardUMF")

# 'stats4' is needed for AIC to work with unmarkedFit objects but is not
# loaded automatically with 'unmarked'.
require(stats4)
invisible(clusterCall(clust, "library", "stats4", character.only = TRUE))

#system.time(print(pdd1 <- pdredge(ufm.mallard,
#   subset = `p(date)` | !`p(I(date^2))`, rank = AIC)))

system.time(print(pdd2 <- pdredge(ufm.mallard, clust,
    subset = `p(date)` | !`p(I(date^2))`, rank = AIC, extra = "adjR^2")))


# best models and null model
subset(pdd2, delta < 2 | df == min(df))

# Compare with the model selection table from unmarked
# the statistics should be identical:
models <- pget.models(pdd2, clust, delta < 2 | df == min(df))

modSel(fitList(fits = structure(models, names = model.names(models,
    labels = getAllTerms(ufm.mallard)))), nullmod = "(Null)")

stopCluster(clust)
} else # if(! inherits(clust, "cluster"))
message("Could not set up the cluster")
}

Run the code above in your browser using DataLab