check2D: Checking GAM residuals along two covariates

Description

This function extracts the residuals of a fitted GAM model, and plots them according to the values of two covariates. Then several visual residuals diagnostics can be plotted by adding layers.

Usage

check2D(
  o,
  x1,
  x2,
  type = "auto",
  maxpo = 10000,
  na.rm = TRUE,
  trans = NULL,
  useSim = TRUE
)

Value

The function will return an object of class c("plotSmooth", "gg"), unless arguments x1 and/or x2 are lists. If they are lists of the same length, then the function will return an object of class c("plotGam", "gg") containing a checking plot for each pair of variables. If x1 is a list and x2 is not specified, the function will return an object of class c("plotGam", "gg") containing a plot for each unique combination of the variables in x1.

Arguments

o: an object of class gamViz.
x1: it can be either a) a single character, b) a numeric vector or c) a list of characters. In case a) it should be the name of one of the variables in the dataframe used to fit o. In case b) its length should be equal to the length of o$y. In case c) it should be a list of names of variables in the dataframe used to fit o.
x2: same as x2, but this will appear on the y-axis.
type: the type of residuals to be used. See residuals.gamViz. If "type == y" then the raw observations will be used.
maxpo: maximum number of residuals points that will be used by layers such as l_rug(). If number of datapoints > maxpo, then a subsample of maxpo points will be taken.
na.rm: if TRUE missing cases in x or y will be dropped out
trans: function used to transform the observed and simulated residuals or responses. It must take a vector of as input, and must return a vector of the same length.
useSim: if FALSE then the simulated responses contained in object o will not be used by this function or by any of the layers that can be used with its output.

Examples

Run this code

if (FALSE) {
library(mgcViz);
#### Example 1: Rosenbrock function
# Simulate data
n <- 1e4
X <- data.frame("x1"=rnorm(n, 0.5, 0.5), "x2"=rnorm(n, 1.5, 1))
X$y <- (1-X$x1)^2 + 100*(X$x2 - X$x1^2)^2 + rnorm(n, 0, 2)
b <- bam(y ~ te(x1, x2, k = 5), data = X, discrete = TRUE)
b <- getViz(b, nsim = 50)

# Plot joint density of observed covariate x1 and x2
check2D(b, x1 = "x1", x2 = "x2") + l_rug() + l_dens(type="joint", alpha=0.6) + l_points()

# Look at how mean of residuals varies across x1 and x2
check2D(b, x1 = "x1", x2 = "x2") + l_gridCheck2D() + l_points()

# Can't see much in previous plot, let's zoom in central area, where most
# data is. Here we can clearly see that the mean model is mispecified
check2D(b, x1 = "x1", x2 = "x2") + l_gridCheck2D(bw = c(0.05, 0.1)) +
                                   xlim(-1, 1) + ylim(0, 3)
# Fit can be improved by increasing k in the bam() call

#### Example 2: checking along factor variables
# Simulate data where variance changes along factor variable "fac"
n <- 1e4
X <- data.frame("x1"=rnorm(n, 0.5, 0.5), "x2"=rnorm(n, 1.5, 1))
X$fac <- as.factor( sample(letters, n, replace = TRUE) )
X$fac2 <- as.factor( sample(c("F1", "F2", "F3", "F4", "F5"), n, replace = TRUE) )
X$y <- (1-X$x1)^2 + 5*(X$x2 - X$x1^2)^2 + 0.1*as.numeric(X$fac) * rnorm(n, 0, 2)
b <- bam(y ~ te(x1, x2, k = 5) + fac + fac2, data = X, discrete = TRUE)
b <- getViz(b, nsim = 50)

# Check standard deviation of residuals along covariates "x1" and "fac"
a <- check2D(b, x1 = "x2", x2 = "fac")
a + l_gridCheck2D(gridFun = sd) + l_rug() + l_points() 

# Points and rug are jittered by default, but we can over-ride this
a + l_rug(position = position_jitter(width = 0, height = 0)) + 
  l_points(position = position_jitter(width = 0, height = 0)) 

# Check standard deviation of residuals along the two factor variables
a <- check2D(b, x1 = "fac", x2 = "fac2")
a + l_gridCheck2D(gridFun = sd, bw = c(1, 4)) + l_rug() + l_points() 
}

Run the code above in your browser using DataLab