The data contain frog population sizes in different ponds with some characteristics of ponds. The data is simulated, thus the "true" model is known. The data can serve to play with different methods for doing model selection.
data(pondfrog)A data frame with 130 observations on the following 9 variables.
froga numeric vector
fisha numeric vector
vegdensitya numeric vector
pha numeric vector
surfaceareaa numeric vector
waterdeptha numeric vector
regiona factor with levels north south
heighta numeric vector
tempa numeric vector
The r-code for producing the pondfrog data is
set.seed(196453) n <- 130 # sample size height <- sample(150:1500,n) region <- sample(c("south", "north"), n, replace=TRUE, prob=c(0.2, 0.8)) waterdepth <- sample(seq(0.3, 5.5, by=0.01), n) surfacearea <- sample(seq(3, 150), n) temp <- 20 - 0.01*height + 0.5*as.numeric(region=="south") -0.005*waterdepth + 0.1*sqrt(surfacearea) +rnorm(n, 0, 1.5) ph <- 7.5 - 0.8 * as.numeric(region=="south") + rnorm(n, 0, 0.2) vegdensity.logitp <- -3.5+0.3*ph + 0.2*temp+rnorm(n,0,1) vegdensity.p <- plogis(vegdensity.logitp) vegdensity <- rbinom(n, 1, prob=vegdensity.p) fish.logitp <- -4+0.3*ph + 0.2*waterdepth+rnorm(n,0,1) fish.p <- plogis(fish.logitp) fish <- rbinom(n, 1, prob=fish.p) frog.mu <- exp(3.5 + 0.2*(temp-mean(temp)) +0.2*(ph-mean(ph)) + 0.1*(ph-mean(ph))^2 - 0.3*(waterdepth-mean(waterdepth)) - 0.5 * fish + 0.5*fish*vegdensity) frog <- rpois(n, lambda=frog.mu)
dat <- data.frame(frog=frog, fish=fish, vegdensity=vegdensity, ph=ph, surfacearea=surfacearea, waterdepth=waterdepth, region=region, height=height, temp=temp)
Thus, the "true" model for the number of pondfrog (frog) is a Poisson model with log-link function and the following linear predictor:
3.5 + 0.2*(temp-mean(temp)) +0.2*(ph-mean(ph)) + 0.1*(ph-mean(ph))^2 - 0.3*(waterdepth-mean(waterdepth)) - 0.5 * fish + 0.5*fish*vegdensity
# NOT RUN {
data(pondfrog)
pairs(pondfrog)
# }
Run the code above in your browser using DataLab