cumulative: Ordinal Regression with Cumulative Probabilities

Description

Fits a cumulative link regression model to a (preferably ordered) factor response.

Usage

cumulative(link = "logitlink", parallel = FALSE, reverse = FALSE,
           multiple.responses = FALSE, whitespace = FALSE)

Arguments

link

Link function applied to the $J$ cumulative probabilities. See Links for more choices, e.g., for the cumulative probitlink/clogloglink/cauchitlink/… models.

parallel

A logical or formula specifying which terms have equal/unequal coefficients. See below for more information about the parallelism assumption. The default results in what some people call the generalized ordered logit model to be fitted. If parallel = TRUE then it does not apply to the intercept.

The partial proportional odds model can be fitted by assigning this argument something like parallel = TRUE ~ -1 + x3 + x5 so that there is one regression coefficient for x3 and x5. Equivalently, setting parallel = FALSE ~ 1 + x2 + x4 means $M$ regression coefficients for the intercept and x2 and x4. It is important that the intercept is never parallel.

reverse

Logical. By default, the cumulative probabilities used are $P(Y\leq 1)$, $P(Y\leq 2)$, …, $P(Y\leq J)$. If reverse is TRUE then $P(Y\geq 2)$, $P(Y\geq 3)$, …, $P(Y\geq J+1)$ are used.

This should be set to TRUE for link= gordlink, pordlink, nbordlink. For these links the cutpoints must be an increasing sequence; if reverse = FALSE for then the cutpoints must be an decreasing sequence.

multiple.responses

Logical. Multiple responses? If TRUE then the input should be a matrix with values $1,2,\dots,L$, where $L=J+1$ is the number of levels. Each column of the matrix is a response, i.e., multiple responses. A suitable matrix can be obtained from Cut.

whitespace

See CommonVGAMffArguments for information.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, and vgam.

Warning

No check is made to verify that the response is ordinal if the response is a matrix; see ordered.

Details

In this help file the response $Y$ is assumed to be a factor with ordered values $1,2,\dots,J+1$. Hence $M$ is the number of linear/additive predictors $\eta_j$; for cumulative() one has $M=J$.

This VGAM family function fits the class of cumulative link models to (hopefully) an ordinal response. By default, the non-parallel cumulative logit model is fitted, i.e., $$\eta_j = logit(P[Y \leq j])$$ where $j=1,2,\dots,M$ and the $\eta_j$ are not constrained to be parallel. This is also known as the non-proportional odds model. If the logit link is replaced by a complementary log-log link (clogloglink) then this is known as the proportional-hazards model.

In almost all the literature, the constraint matrices associated with this family of models are known. For example, setting parallel = TRUE will make all constraint matrices (except for the intercept) equal to a vector of $M$ 1's. If the constraint matrices are equal, unknown and to be estimated, then this can be achieved by fitting the model as a reduced-rank vector generalized linear model (RR-VGLM; see rrvglm). Currently, reduced-rank vector generalized additive models (RR-VGAMs) have not been implemented here.

References

Agresti, A. (2013). Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley.

Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. Hoboken, NJ, USA: Wiley.

Dobson, A. J. and Barnett, A. (2008). An Introduction to Generalized Linear Models, 3rd ed. Boca Raton, FL, USA: Chapman & Hall/CRC Press.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. London: Chapman & Hall.

Simonoff, J. S. (2003). Analyzing Categorical Data, New York: Springer-Verlag.

Yee, T. W. (2010). The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1--34. 10.18637/jss.v032.i10.

Yee, T. W. and Wild, C. J. (1996). Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481--493.

Examples

Run this code

# NOT RUN {
# Fit the proportional odds model, p.179, in McCullagh and Nelder (1989)
pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
             cumulative(parallel = TRUE, reverse = TRUE), data = pneumo))
depvar(fit)  # Sample proportions (good technique)
fit@y        # Sample proportions (bad technique)
weights(fit, type = "prior")  # Number of observations
coef(fit, matrix = TRUE)
constraints(fit)  # Constraint matrices
apply(fitted(fit), 1, which.max)  # Classification
apply(predict(fit, newdata = pneumo, type = "response"),
      1, which.max)  # Classification
R2latvar(fit)

# Check that the model is linear in let ----------------------
fit2 <- vgam(cbind(normal, mild, severe) ~ s(let, df = 2),
             cumulative(reverse = TRUE), data = pneumo)
# }
# NOT RUN {
 plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2) 
# }
# NOT RUN {
# Check the proportional odds assumption with a LRT ----------
(fit3 <- vglm(cbind(normal, mild, severe) ~ let,
              cumulative(parallel = FALSE, reverse = TRUE), data = pneumo))
pchisq(2 * (logLik(fit3) - logLik(fit)),
       df = length(coef(fit3)) - length(coef(fit)), lower.tail = FALSE)
lrtest(fit3, fit)  # More elegant

# A factor() version of fit ----------------------------------
# This is in long format (cf. wide format above)
Nobs <- round(depvar(fit) * c(weights(fit, type = "prior")))
sumNobs <- colSums(Nobs)  # apply(Nobs, 2, sum)

pneumo.long <-
  data.frame(symptoms = ordered(rep(rep(colnames(Nobs), nrow(Nobs)),
                                        times = c(t(Nobs))),
                                levels = colnames(Nobs)),
             let = rep(rep(with(pneumo, let), each = ncol(Nobs)),
                       times = c(t(Nobs))))
with(pneumo.long, table(let, symptoms))  # Should be same as pneumo


(fit.long1 <- vglm(symptoms ~ let, data = pneumo.long, trace = TRUE,
                   cumulative(parallel = TRUE, reverse = TRUE)))
coef(fit.long1, matrix = TRUE)  # Should be as coef(fit, matrix = TRUE)
# Could try using mustart if fit.long1 failed to converge.
mymustart <- matrix(sumNobs / sum(sumNobs),
                    nrow(pneumo.long), ncol(Nobs), byrow = TRUE)
fit.long2 <- vglm(symptoms ~ let, mustart = mymustart,
                  cumulative(parallel = TRUE, reverse = TRUE),
                  data = pneumo.long, trace = TRUE)
coef(fit.long2, matrix = TRUE)  # Should be as coef(fit, matrix = TRUE)
# }

Run the code above in your browser using DataLab