residuals: Extract Residuals for a Box-Cox Symmetric Regression Fit

Description

Residuals resulting from fitting a Box-Cox symmetric or a zero-adjusted Box-Cox symmetric regression.

Usage

# S3 method for BCSreg
residuals(object, approach = c("combined", "separated"), ...)

Value

If a Box-Cox symmetric regression is fitted to the data, it returns a numeric vector containing the quantile residuals (Dunn and Smyth, 1996).

If the model is a zero-adjusted Box-Cox symmetric regression:

For approach = "combined", it returns a numeric vector with "combined" quantile residuals. See details
For approach = "separated", it returns a list with two components: continuous (quantile residuals for strictly positive responses) and discrete (standardized Pearson residuals for the discrete component).

Arguments

object: an object of class "BCSreg", a result of a call to BCSreg.
approach: a character string indicating the approach for calculating residuals when a zero-adjusted regression is fitted. Should be either "combined" (default) for combined residuals or "separated" for separate residuals. Ignored if the model is not zero-adjusted.
...: further arguments passed to or from other methods.

Author

Francisco F. de Queiroz <felipeq@ime.usp.br>

Rodrigo M. R. de Medeiros <rodrigo.matheus@ufrn.br>

Details

For a Box-Cox symmetric regression fit, the residuals are the quantile residuals (Dunn and Smyth, 1996), defined by \(r_i^q = \Phi^{-1}(\widehat{F}(y_i))\), where \(\widehat{F}(\cdot)\) is the fitted cumulative distribution function and \(\Phi(\cdot)\) is cumulative distribution function of the standard normal distribution.

For zero-adjusted Box-Cox symmetric regressions, two approaches are available:

Combined approach: Returns a single vector of residuals defined as \( r_i^q = \begin{cases} \Phi^{-1}(u_i), & y_i = 0, \\ \Phi^{-1}\left[\widehat{F}^{(0)}(y_i)\right], & y_i > 0, \end{cases} \) where \(u_i\) is a random variable uniformly distributed in \((0, \widehat{\alpha}_i]\) and \(F^{(0)}\) is the fitted cumulative distribution function of the mixed response.
Separated approach: Returns a list containing:
- Quantile residuals for the positive (continuous) component.
- Standardized Pearson residuals for the discrete component, defined by \( r_i^p = \frac{\mathbb{I}(y_i = 0) - \widehat{\alpha}_i} {\sqrt{\widehat{\alpha}_i(1-\widehat{\alpha}_i)(1-\widehat{h}_{ii})}}, \) where \(\widehat{h}_{ii}\) is the \(i\)th diagonal element of the "hat matrix" resulting from a fit of a generalized linear model with a binary response given by \(\mathbb{I}(y_i = 0)\), being \(\mathbb{I}\) the indicator function.

See more details in Medeiros and Queiroz (2025).

References

Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5, 236---244.

Medeiros, R. M. R., and Queiroz, F. F. (2025). Flexible modeling of nonnegative continuous data: Box-Cox symmetric regression and its zero-adjusted extension.

Examples

Run this code

# BCS regression for strictly positive response variables

## Data set: raycatch (for description, run ?raycatch)
hist(raycatch$cpue, xlab = "Catch per unit effort")
plot(cpue ~ tide_phase, raycatch, pch = 16,
     xlab = "Tide phase", ylab = "Catch per unit effort")
plot(cpue ~ location, raycatch, pch = 16,
     xlab = "Location", ylab = "Catch per unit effort")
plot(cpue ~ max_temp, raycatch, pch = 16,
     xlab = "Maximum temperature", ylab = "Catch per unit effort")

## BCS fit
fit <- BCSreg(cpue ~ location + tide_phase + max_temp |
                location + tide_phase + max_temp, raycatch)

## Quantile residuals
rq <- residuals(fit)
rq

## Normal probability plot
qqnorm(rq, pch = "+", cex = 0.8)
qqline(rq, col = "dodgerblue", lwd = 2)

# Zero-adjusted BCS (ZABCS) regression for nonnegative response variables

## Data set: renewables2015 (for description, run ?renewables2015)
plot(ecdf(renewables2015$renew_elec_output), cex = 0.3, main = "Empirical CDF")
abline(h = mean(renewables2015$renew_elec_output == 0), col = "grey", lty = 3)
text(1250, 0.155, paste0("prop. of zeros: ~0.12"), col = "blue")

plot(renew_elec_output ~ adj_sav_edu, renewables2015, pch = 16,
     xlab = "Education expenditure (percent of GNI)",
     ylab = "Renewable electricity output (in TWh)")
plot(renew_elec_output ~ agri_land, renewables2015, pch = 16,
     xlab = "Matural logarithm of total agricultural land area",
     ylab = "Renewable electricity output (in TWh)")

## Zero-adjusted BCS fit
fit0 <- BCSreg(renew_elec_output ~ adj_sav_edu + agri_land |
                 adj_sav_edu + agri_land | adj_sav_edu + agri_land, renewables2015)

## Combined approach (default)
rq <- residuals(fit0)
rq

### Normal probability plot
qqnorm(rq, pch = "+", cex = 0.8)
qqline(rq, col = "dodgerblue", lwd = 2)

## Separated approach
res <- residuals(fit0, approach = "separated")
str(res)

### Normal probability plots

# Continuous part
qqnorm(res$continuous, pch = "+", cex = 0.8)
qqline(res$continuous, col = "dodgerblue", lwd = 2)

# Discrete part (Pearson's standardized residuals do not have a normal distribution.)
qqnorm(res$discrete, pch = "+", cex = 0.8)
qqline(res$discrete, col = "dodgerblue", lwd = 2)

Run the code above in your browser using DataLab