PPC.residuals: Calculate Residuals from MCMC Output of Occupancy Models

Description

Posterior Predictive Check (PPC) function that calculates Freeman-Tukey (FT) residuals, Pearson"s Chi-squared residuals, or deviance from MCMC output of occupancy models. This function compares observed data to simulated data from the posterior distribution to assess model fit.

Usage

PPC.residuals(
  y,
  p,
  psi,
  model = c("Occupancy", "RN"),
  type = c("FT", "PearChi2", "Deviance"),
  K = NULL,
  z.cond = TRUE,
  zhat = NULL,
  nmax = 20,
  return.residuals = TRUE,
  return.z = TRUE
)

Value

If return.residuals=TRUE (default), returns a list containing:

res.obs - residuals for observed data
res.new - residuals for newly generated data
BP - Bayesian p-value

If return.residuals=FALSE, returns only the Bayesian p-value.

Arguments

y: Observations as either a site vector or site by occasion matrix. For matrix format, use NA for unsampled occasions.
p: Array of posterior samples for detection probability (p). Dimensions should be iterations by sites (by occasion optionally). For RN models, p should represent individual-level detection (not conditional on local abundance).
psi: Array of posterior samples for occupancy probability (psi). Dimensions should be iterations by sites. For RN models, psi should represent expected abundance
model: Character indicating model type: either "Occupancy" or "RN" (Royle-Nichols).
type: Type of residual to calculate: "FT" (Freeman Tukey), "PearChi2" (Pearson Chi-squared), or "Deviance" (not technically a residual).
K: Number of occasions as either a scalar or site vector. Calculated automatically if y is a matrix.
z.cond: Logical. If TRUE, new data is conditioned on estimated z (testing only detection model fit). If FALSE, generates new z for each posterior sample (testing complete model).
zhat: Optional matrix with same dimensions as psi containing estimates of z from the same model. If not provided, will be generated internally.
nmax: Maximum site-level abundance (default = 20). Only used if model="RN". Higher values increase computation time. Warning given if set too low.
return.residuals: Logical. If TRUE (default), returns residuals along with Bayesian p-value.
return.z: Logical. If TRUE, returns z values conditional on y, and unconditional z's if z.cond = FALSE. Note: if zhat is provided, the returned conditional-on-y z values will be identical to those provided.

Warning

This is a beta version of the function. While it has been tested extensively, not all possible data configurations may have been captured in testing. This is particularly true for:

Deviance calculations (type = "Deviance")
Royle-Nichols models (model = "RN")

If you encounter issues with the function, please contact the package developers.

Author

Rahel Sollmann

Details

This function helps assess model fit for occupancy models using various types of residuals:

Types of Residuals

Freeman-Tukey (FT): $$R_j = (\sqrt{y_j} - \sqrt{E(y_j)})^2$$ Measures the squared difference between the square root of observed detections and the square root of expected detections at each site.
Pearson Chi-squared: $$R_j = \left(\frac{y_j - E(y_j)}{\sqrt{Var(y_j)}}\right)^2$$ Measures the squared difference between observed and expected detections, standardized by the theoretical variance calculated from the model parameters.
Deviance: $$R_j = -2\log[y_j|\theta_j, K_j]$$ Measures the contribution of each site to the overall model likelihood, quantifying the discrepancy between observed data and model predictions based on likelihood ratios

Where:

$y_j$ is the number of detections of the species at site j, out of $K_j$ repeated surveys
$E(y_j) = K_j p_j z_j$, with $p_j$ = species detection probability and $z_j$ = occupancy state (1 if occupied, 0 otherwise)
$Var(y_j) = p_j z_j (1 - p_j z_j) K_j$
For Royle-Nichols occupancy models, the term $p_j z_j$ is replaced with $1 - (1 - r_j)^{N_j}$, where $r_j$ = individual detection probability and $N_j$ = local abundance
For Deviance, $\theta_j$ is either occupancy and species detection probability at site j ($\psi_j, p_j$) for regular occupancy models, or expected abundance and individual detection probability ($\lambda_j, r_j$) for Royle-Nichols occupancy models

Bayesian P-values

The function calculates Bayesian p-values as a measure of model fit. These values:

Range from 0 to 1
Values close to 0.5 suggest good model fit
Values close to 0 or 1 suggest poor fit
Are calculated by comparing observed residuals to residuals from simulated data

Conditional vs Unconditional Assessment

The z.cond parameter allows for two types of model assessment:

z.cond = TRUE: Tests only the detection component of the model, fixing occupancy/abundance to estimates from the model, rather than generating them anew
z.cond = FALSE: Tests the complete model, including both occupancy and detection components

References

Sollmann, Rahel. Occupancy models and the "good fit, bad prediction" dilemma. Ecology (submitted)