A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. In this documentation, \(M\) is the number of linear predictors.
rrvglm(formula, family = stop("argument 'family' needs to be assigned"),
data = list(), weights = NULL, subset = NULL,
na.action = na.fail, etastart = NULL, mustart = NULL,
coefstart = NULL, control = rrvglm.control(...), offset = NULL,
method = "rrvglm.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
contrasts = NULL, constraints = NULL, extra = NULL,
qr.arg = FALSE, smart = TRUE, ...)
See vglm
.
an optional data frame containing the variables in the model.
By default the variables are taken from environment(formula)
,
typically the environment from which rrvglm
is called.
See vglm
.
See vglm
.
a list of parameters for controlling the fitting process.
See rrvglm.control
for details.
See vglm
.
the method to be used in fitting the model.
The default (and presently only) method rrvglm.fit
uses iteratively reweighted least squares (IRLS).
logical values indicating whether
the model matrix and response vector/matrix used in the fitting
process should be assigned in the x
and y
slots.
Note the model matrix is the LM model matrix; to get the VGLM
model matrix type model.matrix(vglmfit)
where
vglmfit
is a vglm
object.
See vglm
.
See vglm
.
further arguments passed into rrvglm.control
.
An object of class "rrvglm"
, which has the the same
slots as a "vglm"
object. The only difference is
that the some of the constraint matrices are estimates
rather than known. But VGAM stores the models the
same internally. The slots of "vglm"
objects are
described in vglm-class
.
The central formula is given by $$\eta = B_1^T x_1 + A \nu$$ where \(x_1\) is a vector (usually just a 1 for an intercept), \(x_2\) is another vector of explanatory variables, and \(\nu = C^T x_2\) is an \(R\)-vector of latent variables. Here, \(\eta\) is a vector of linear predictors, e.g., the \(m\)th element is \(\eta_m = \log(E[Y_m])\) for the \(m\)th Poisson response. The matrices \(B_1\), \(A\) and \(C\) are estimated from the data, i.e., contain the regression coefficients. For ecologists, the central formula represents a constrained linear ordination (CLO) since it is linear in the latent variables. It means that the response is a monotonically increasing or decreasing function of the latent variables.
For identifiability it is common to enforce corner constraints on \(A\): by default, the top \(R\) by \(R\) submatrix is fixed to be the order-\(R\) identity matrix and the remainder of \(A\) is estimated.
The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).
In theory, any VGAM family function that works for
vglm
and vgam
should work
for rrvglm
too.
The function that actually does the work is rrvglm.fit
;
it is vglm.fit
with some extra code.
Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15--41.
Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685--701.
Anderson, J. A. (1984) Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1--30.
Yee, T. W. (2014) Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889--902.
rrvglm.control
,
lvplot.rrvglm
(same as biplot.rrvglm
),
rrvglm-class
,
grc
,
cqo
,
vglmff-class
,
vglm
,
vglm-class
,
smartpred
,
rrvglm.fit
.
Special family functions include
negbinomial
zipoisson
and zinegbinomial
.
(see Yee (2014) and COZIGAM).
Methods functions include
Coef.rrvglm
,
calibrate.rrvglm
,
summary.rrvglm
,
etc.
Data include
crashi
.
# NOT RUN { # Example 1: RR negative binomial with Var(Y) = mu + delta1 * mu^delta2 nn <- 1000 # Number of observations delta1 <- 3.0 # Specify this delta2 <- 1.5 # Specify this; should be greater than unity a21 <- 2 - delta2 mydata <- data.frame(x2 = runif(nn), x3 = runif(nn)) mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3)) mydata <- transform(mydata, y2 = rnbinom(nn, mu = mu, size = (1/delta1)*mu^a21)) plot(y2 ~ x2, data = mydata, pch = "+", col = 'blue', las = 1, main = paste("Var(Y) = mu + ", delta1, " * mu^", delta2, sep = "")) rrnb2 <- rrvglm(y2 ~ x2 + x3, negbinomial(zero = NULL), data = mydata, trace = TRUE) a21.hat <- (Coef(rrnb2)@A)["loge(size)", 1] beta11.hat <- Coef(rrnb2)@B1["(Intercept)", "loge(mu)"] beta21.hat <- Coef(rrnb2)@B1["(Intercept)", "loge(size)"] (delta1.hat <- exp(a21.hat * beta11.hat - beta21.hat)) (delta2.hat <- 2 - a21.hat) # exp(a21.hat * predict(rrnb2)[1,1] - predict(rrnb2)[1,2]) # delta1.hat summary(rrnb2) # Obtain a 95 percent confidence interval for delta2: se.a21.hat <- sqrt(vcov(rrnb2)["I(latvar.mat)", "I(latvar.mat)"]) ci.a21 <- a21.hat + c(-1, 1) * 1.96 * se.a21.hat (ci.delta2 <- 2 - rev(ci.a21)) # The 95 percent confidence interval Confint.rrnb(rrnb2) # Quick way to get it # Plot the abundances and fitted values against the latent variable plot(y2 ~ latvar(rrnb2), data = mydata, col = "blue", xlab = "Latent variable", las = 1) ooo <- order(latvar(rrnb2)) lines(fitted(rrnb2)[ooo] ~ latvar(rrnb2)[ooo], col = "orange") # Example 2: stereotype model (reduced-rank multinomial logit model) data(car.all) scar <- subset(car.all, is.element(Country, c("Germany", "USA", "Japan", "Korea"))) fcols <- c(13,14,18:20,22:26,29:31,33,34,36) # These are factors scar[, -fcols] <- scale(scar[, -fcols]) # Standardize all numerical vars ones <- matrix(1, 3, 1) clist <- list("(Intercept)" = diag(3), Width = ones, Weight = ones, Disp. = diag(3), Tank = diag(3), Price = diag(3), Frt.Leg.Room = diag(3)) set.seed(111) fit <- rrvglm(Country ~ Width + Weight + Disp. + Tank + Price + Frt.Leg.Room, multinomial, data = scar, Rank = 2, trace = TRUE, constraints = clist, noRRR = ~ 1 + Width + Weight, Uncor = TRUE, Corner = FALSE, Bestof = 2) fit@misc$deviance # A history of the fits Coef(fit) biplot(fit, chull = TRUE, scores = TRUE, clty = 2, Ccex = 2, ccol = "blue", scol = "orange", Ccol = "darkgreen", Clwd = 2, main = "1=Germany, 2=Japan, 3=Korea, 4=USA") # }
Run the code above in your browser using DataCamp Workspace