VGAM (version 1.1-6)

# cumulative: Ordinal Regression with Cumulative Probabilities

## Description

Fits a cumulative link regression model to a (preferably ordered) factor response.

## Usage

cumulative(link = "logitlink", parallel = FALSE, reverse = FALSE,
multiple.responses = FALSE, whitespace = FALSE)

## Arguments

Link function applied to the $$J$$ cumulative probabilities. See Links for more choices, e.g., for the cumulative probitlink/clogloglink/cauchitlink/… models.

parallel

A logical or formula specifying which terms have equal/unequal coefficients. See below for more information about the parallelism assumption. The default results in what some people call the generalized ordered logit model to be fitted. If parallel = TRUE then it does not apply to the intercept.

The partial proportional odds model can be fitted by assigning this argument something like parallel = TRUE ~ -1 + x3 + x5 so that there is one regression coefficient for x3 and x5. Equivalently, setting parallel = FALSE ~ 1 + x2 + x4 means $$M$$ regression coefficients for the intercept and x2 and x4. It is important that the intercept is never parallel.

reverse

Logical. By default, the cumulative probabilities used are $$P(Y\leq 1)$$, $$P(Y\leq 2)$$, …, $$P(Y\leq J)$$. If reverse is TRUE then $$P(Y\geq 2)$$, $$P(Y\geq 3)$$, …, $$P(Y\geq J+1)$$ are used.

This should be set to TRUE for link= gordlink, pordlink, nbordlink. For these links the cutpoints must be an increasing sequence; if reverse = FALSE for then the cutpoints must be an decreasing sequence.

multiple.responses

Logical. Multiple responses? If TRUE then the input should be a matrix with values $$1,2,\dots,L$$, where $$L=J+1$$ is the number of levels. Each column of the matrix is a response, i.e., multiple responses. A suitable matrix can be obtained from Cut.

whitespace

See CommonVGAMffArguments for information.

## Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

## Warning

No check is made to verify that the response is ordinal if the response is a matrix; see ordered.

## Details

In this help file the response $$Y$$ is assumed to be a factor with ordered values $$1,2,\dots,J+1$$. Hence $$M$$ is the number of linear/additive predictors $$\eta_j$$; for cumulative() one has $$M=J$$.

This VGAM family function fits the class of cumulative link models to (hopefully) an ordinal response. By default, the non-parallel cumulative logit model is fitted, i.e., $$\eta_j = logit(P[Y \leq j])$$ where $$j=1,2,\dots,M$$ and the $$\eta_j$$ are not constrained to be parallel. This is also known as the non-proportional odds model. If the logit link is replaced by a complementary log-log link (clogloglink) then this is known as the proportional-hazards model.

In almost all the literature, the constraint matrices associated with this family of models are known. For example, setting parallel = TRUE will make all constraint matrices (except for the intercept) equal to a vector of $$M$$ 1's. If the constraint matrices are equal, unknown and to be estimated, then this can be achieved by fitting the model as a reduced-rank vector generalized linear model (RR-VGLM; see rrvglm). Currently, reduced-rank vector generalized additive models (RR-VGAMs) have not been implemented here.

## References

Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley.

Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ, USA: Wiley.

Dobson, A. J. and Barnett, A. (2008). An Introduction to Generalized Linear Models, 3rd ed. Boca Raton, FL, USA: Chapman & Hall/CRC Press.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Simonoff, J. S. (2003). Analyzing Categorical Data, New York: Springer-Verlag.

Yee, T. W. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1--34. 10.18637/jss.v032.i10.

Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

propodds, R2latvar, ordsup, prplot, margeff, acat, cratio, sratio, multinomial, pneumo, Links, hdeff.vglm, logitlink, probitlink, clogloglink, cauchitlink, gordlink, pordlink, nbordlink, logistic1.

## Examples

Run this code
# NOT RUN {
# Fit the proportional odds model, p.179, in McCullagh and Nelder (1989)
pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = TRUE, reverse = TRUE), data = pneumo))
depvar(fit)  # Sample proportions (good technique)
fit@y        # Sample proportions (bad technique)
weights(fit, type = "prior")  # Number of observations
coef(fit, matrix = TRUE)
constraints(fit)  # Constraint matrices
apply(fitted(fit), 1, which.max)  # Classification
apply(predict(fit, newdata = pneumo, type = "response"),
1, which.max)  # Classification
R2latvar(fit)

# Check that the model is linear in let ----------------------
fit2 <- vgam(cbind(normal, mild, severe) ~ s(let, df = 2),
cumulative(reverse = TRUE), data = pneumo)
# }
# NOT RUN {
plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2)
# }
# NOT RUN {
# Check the proportional odds assumption with a LRT ----------
(fit3 <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = FALSE, reverse = TRUE), data = pneumo))
pchisq(2 * (logLik(fit3) - logLik(fit)),
df = length(coef(fit3)) - length(coef(fit)), lower.tail = FALSE)
lrtest(fit3, fit)  # More elegant

# A factor() version of fit ----------------------------------
# This is in long format (cf. wide format above)
Nobs <- round(depvar(fit) * c(weights(fit, type = "prior")))
sumNobs <- colSums(Nobs)  # apply(Nobs, 2, sum)

pneumo.long <-
data.frame(symptoms = ordered(rep(rep(colnames(Nobs), nrow(Nobs)),
times = c(t(Nobs))),
levels = colnames(Nobs)),
let = rep(rep(with(pneumo, let), each = ncol(Nobs)),
times = c(t(Nobs))))
with(pneumo.long, table(let, symptoms))  # Should be same as pneumo

(fit.long1 <- vglm(symptoms ~ let, data = pneumo.long, trace = TRUE,
cumulative(parallel = TRUE, reverse = TRUE)))
coef(fit.long1, matrix = TRUE)  # Should be as coef(fit, matrix = TRUE)
# Could try using mustart if fit.long1 failed to converge.
mymustart <- matrix(sumNobs / sum(sumNobs),
nrow(pneumo.long), ncol(Nobs), byrow = TRUE)
fit.long2 <- vglm(symptoms ~ let, mustart = mymustart,
cumulative(parallel = TRUE, reverse = TRUE),
data = pneumo.long, trace = TRUE)
coef(fit.long2, matrix = TRUE)  # Should be as coef(fit, matrix = TRUE)
# }


Run the code above in your browser using DataCamp Workspace