cumulative: Ordinal Regression with Cumulative Probabilities

Description

Fits a cumulative link regression model to a (preferably ordered) factor response.

Usage

cumulative(link = "logitlink", parallel = FALSE,
    reverse = FALSE, multiple.responses = FALSE,
    ynames = FALSE, Thresh = NULL, Trev = reverse,
    Tref = if (Trev) "M" else 1, Intercept = NULL,
    whitespace = FALSE)

Value

An object of class "vglmff"

(see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Arguments

link: Link function applied to the $J$ cumulative probabilities. See Links for more choices, e.g., for the cumulative probitlink/clogloglink/... models.

parallel: A logical or formula specifying which terms have equal/unequal coefficients. See below for more information about the parallelism assumption. The default results in what some people call the generalized ordered logit model to be fitted. If parallel = TRUE then it does not apply to the intercept.

reverse

Logical. By default, the cumulative probabilities used are $P(Y\leq 1)$, $P(Y\leq 2)$, ..., $P(Y\leq J)$. If reverse is TRUE then $P(Y\geq 2)$, $P(Y\geq 3)$, ..., $P(Y\geq J+1)$ are used.

ynames

See multinomial for information.

multiple.responses

Logical. Multiple responses? If TRUE then the input should be a matrix with values $1,2,\dots,L$, where $L=J+1$ is the number of levels. Each column of the matrix is a response, i.e., multiple responses. A suitable matrix can be obtained from Cut.

Thresh: Character. The choices concern constraint matrices applied to the intercepts. They can be constrained to be equally-spaced (equidistant) etc. See CommonVGAMffArguments and constraints for general information. Basically, the choice is pasted to the end of "CM." and that function is called. This means users can easily write their own CM.-type function.
Trev, Tref: Support arguments for Thresh for equally-spaced intercepts. The logical argument Trev is applied first to give the direction (i.e., ascending or descending) before row Tref (ultimately numeric) of the first (intercept) constraint matrix is set to the reference level. See constraints for information.
Intercept: Another support argument for Thresh: does the constraint matrix include an intercept? The NULL default means to use the default of the "CM." function.
whitespace: See CommonVGAMffArguments for information.

Author

Thomas W. Yee

Warning

No check is made to verify that the response is ordinal if the response is a matrix; see ordered.

Boersch-Supan (2021) looks at sparse data and the numerical problems that result; see sratio.

Details

In this help file the response $Y$ is assumed to be a factor with ordered values $1,2,\dots,J+1$. Hence $M$ is the number of linear/additive predictors $\eta_j$; for cumulative() one has $M=J$.

This VGAM family function fits the class of cumulative link models to (hopefully) an ordinal response. By default, the non-parallel cumulative logit model is fitted, i.e., $$\eta_j = logit(P[Y \leq j])$$ where $j=1,2,\dots,M$ and the $\eta_j$ are not constrained to be parallel. This is also known as the non-proportional odds model. If the logit link is replaced by a complementary log-log link (clogloglink) then this is known as the proportional-hazards model.

In almost all the literature, the constraint matrices associated with this family of models are known. For example, setting parallel = TRUE will make all constraint matrices (except for the intercept) equal to a vector of $M$ 1's. If the constraint matrices are equal, unknown and to be estimated, then this can be achieved by fitting the model as a reduced-rank vector generalized linear model (RR-VGLM; see rrvglm). Currently, reduced-rank vector generalized additive models (RR-VGAMs) have not been implemented here.

References

Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley.

Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ, USA: Wiley.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Pujol-Rigol, S., Fernandez, D. and Casals, M. (2025). A systematic review and comparative study of R packages for ordinal response regression models. WIREs Computational Statistics, 17, e70025.

Tutz, G. (2012). Regression for Categorical Data, Cambridge: Cambridge University Press.

Tutz, G. and Berger, M. (2022). Sparser ordinal regression models based on parametric and additive location-shift approaches. International Statistical Review, 90, 306--327. tools:::Rd_expr_doi("10.1111/insr.12484").

Yee, T. W. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1--34. tools:::Rd_expr_doi("10.18637/jss.v032.i10").

Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

Examples

Run this code

# Proportional odds model (p.179) of McCullagh and Nelder (1989)
pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
             cumulative(parallel = TRUE, reverse = TRUE), pneumo))
depvar(fit)  # Sample proportions (good technique)
fit@y        # Sample proportions (bad technique)
weights(fit, type = "prior")  # Number of observations
coef(fit, matrix = TRUE)
constraints(fit)  # Constraint matrices
apply(fitted(fit), 1, which.max)  # Classification
apply(predict(fit, newdata = pneumo, type = "response"),
      1, which.max)  # Classification
R2latvar(fit)

# Check that the model is linear in let ----------------------
fit2 <- vgam(cbind(normal, mild, severe) ~ s(let, df = 2),
             cumulative(reverse = TRUE), data = pneumo)
if (FALSE) {
 plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2) }

# Check the proportional odds assumption with a LRT -----
(fit3 <- vglm(cbind(normal, mild, severe) ~ let,
              cumulative(parallel = FALSE, reverse = TRUE), pneumo))
pchisq(2 * (logLik(fit3) - logLik(fit)), df =
       length(coef(fit3)) - length(coef(fit)), lower.tail = FALSE)
lrtest(fit3, fit)  # More elegant

# A factor() version of fit ----------------------------
# This is in long format (cf. wide format above)
Nobs <- round(depvar(fit) * c(weights(fit, type = "prior")))
sumNobs <- colSums(Nobs)  # apply(Nobs, 2, sum)

pneumo.long <-
  data.frame(symptoms = ordered(rep(rep(colnames(Nobs), nrow(Nobs)),
                                        times = c(t(Nobs))),
                                levels = colnames(Nobs)),
             let = rep(rep(with(pneumo, let), each = ncol(Nobs)),
                       times = c(t(Nobs))))
with(pneumo.long, table(let, symptoms))  # Should be same as pneumo


(fit.long1 <- vglm(symptoms ~ let, data = pneumo.long, trace = TRUE,
                   cumulative(parallel = TRUE, reverse = TRUE)))
coef(fit.long1, matrix = TRUE)  # cf. coef(fit, matrix = TRUE)
# Could try using mustart if fit.long1 failed to converge.
mymustart <- matrix(sumNobs / sum(sumNobs),
                    nrow(pneumo.long), ncol(Nobs), byrow = TRUE)
fit.long2 <- vglm(symptoms ~ let, mustart = mymustart,
                  cumulative(parallel = TRUE, reverse = TRUE),
                  data = pneumo.long, trace = TRUE)
coef(fit.long2, matrix = TRUE)  # cf. coef(fit, matrix = TRUE)

Run the code above in your browser using DataLab