gllvm: Generalized Linear Latent Variable Models

Description

Fits generalized linear latent variable models for multivariate data. The models can be fitted using Laplace approximation method or variational approximation method.

Usage

gllvm(y = NULL, X = NULL, TR = NULL, data = NULL, formula = NULL,
  num.lv = 2, family, method = "VA", row.eff = FALSE, offset = NULL,
  sd.errors = TRUE, Lambda.struc = "unstructured", diag.iter = 5,
  trace = FALSE, plot = FALSE, la.link.bin = "logit", n.init = 1,
  Power = 1.5, reltol = 1e-06, seed = NULL, max.iter = 200,
  maxit = 1000, start.fit = NULL, starting.val = "res", TMB = TRUE,
  optimizer = "optim", Lambda.start = 0.1, jitter.var = 0)

Arguments

(n x m) matrix of responses.

matrix or data.frame of environmental covariates.

matrix of trait covariates.

data

data in long format, that is, matrix of responses, environmental and trait covariates and row index named as <U+2019>id<U+2019>. When used, model needs to be defined using formula. This is alternative data input for y, X and TR.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

num.lv

number of latent variables, d, in gllvm model. Non-negative integer, less than number of response variables (m). Defaults to 2.

family

distribution function for responses. Options are "poisson" (with log link), "negative.binomial" (with log link), "binomial" (with logit/probit link when method = "LA" and probit link when method = "VA"), zero inflated poisson ("ZIP") and Tweedie ("tweedie") (with log link, only with "LA"-method), "ordinal" (only with "VA"-method).

method

model can be fitted using Laplace approximation method (method = "LA") or variational approximation method (method = "VA"). Defaults to "VA".

row.eff

FALSE, TRUE or "random", Indicating whether row effects are included in the model as a fixed or as a random effects. Defaults to FALSE.

offset

vector or matrix of offset terms.

sd.errors

logical. If TRUE (default) standard errors for parameter estimates are calculated.

Lambda.struc

covariance structure of VA distributions for latent variables when method = "VA", "unstructured" or "diagonal".

diag.iter

non-negative integer which is used to speed up the updating of variational (covariance) parameters in VA method. Defaults to 5.

trace

logical, if TRUE in each iteration step information on current step will be printed. Defaults to FALSE. Only with TMB = FALSE.

plot

logical, if TRUE ordination plots will be printed in each iteration step when TMB = FALSE. Defaults to FALSE.

la.link.bin

link function for binomial family if method = "LA". Options are "logit" and "probit.

n.init

number of initial runs. Uses multiple runs and picks up the one giving highest log-likelihood value. Defaults to 1.

Power

fixed power parameter in Tweedie model. Scalar from interval (1,2). Defaults to 1.5.

reltol

convergence criteria for log-likelihood, defaults to 1e-6.

seed

a single seed value, defaults to NULL.

max.iter

maximum number of iterations when TMB = FALSE, defaults to 200.

maxit

maximum number of iterations within optim function, defaults to 1000.

start.fit

object of class 'gllvm' which can be given as starting parameters for count data (poisson, NB, or ZIP).

starting.val

starting values can be generated by fitting model without latent variables, and applying factorial analysis to residuals to get starting values for latent variables and their coefficients (starting.val = "res"). Another options are to use zeros as a starting values (starting.val = "zero") or initialize starting values for latent variables with (n x num.lv) matrix. Defaults to "res", which is recommended.

TMB

logical, if TRUE model will be fitted using Template Model Builder (TMB). TMB is always used if method = "LA". Defaults to TRUE.

optimizer

if TMB=TRUE, log-likelihood can be optimized using "optim" (default) or "nlminb".

Lambda.start

starting values for variances in VA distributions for latent variables in variational approximation method. Defaults to 0.1.

jitter.var

jitter variance for starting values of latent variables. Defaults to 0, meaning no jittering.

Value

An object of class "gllvm" includes the following components:

call

function call

logL

log likelihood

lvs

latent variables

params

list of parameters

$theta coefficients related to latent variables
$beta0 column specific intercepts
$Xcoef coefficients related to environmental covariates X
$B coefficients in fourth corner model
$row.params row-specific intercepts
$phi dispersion parameters $\phi$ for negative binomial or Tweedie family, or probability of zero inflation for ZIP family
$inv.phi dispersion parameters $1/\phi$ for negative binomial

Power

power parameter $\nu$ for Tweedie family

list of standard errors of parameters

Details

Fits generalized linear latent variable models as in Hui et al. (2015 and 2017) and Niku et al. (2017). Method can be used with two types of latent variable models depending on covariates. If only site related environmental covariates are used, the expectation of response $Y_{ij}$ is determined by

$$g(\mu_{ij}) = \eta_{ij} = \alpha_i + \beta_{0j} + x_i'\beta_j + u_i'\theta_j,$$

where $g(.)$ is a known link function, $u_i$ are $d$-variate latent variables ($d$<<$m$), $\alpha_i$ is an optional row effect at site $i$, $\beta_{0j}$ is an intercept term for species $j$, $\beta_j$ and $\theta_j$ are column specific coefficients related to covariates and the latent variables, respectively.

The alternative model is fourth corner model (Brown et al., 2014, Warton et al., 2015) which will be fitted if also trait covariates are included. The expectation of response $Y_{ij}$ is

$$g(\mu_{ij}) = \alpha_i + \beta_{0j} + x_i'\beta_x + TR_j'\beta_t + vec(B)*kronecker(TR_j,X_i) + u_i'\theta_j$$

where g(.), $u_i$, $\beta_{0j}$ and $\theta_j$ are defined as above. Vectors $\beta_x$ and $\beta_t$ are the main effects or coefficients related to environmental and trait covariates, respectively, matrix $B$ includes interaction terms. The interaction/fourth corner terms are optional as well as are the main effects of trait covariates.

The method is sensitive for the choices of initial values of the latent variables. Therefore it is recommendable to use multiple runs and pick up the one giving the highest log-likelihood value. However, sometimes this is computationally too demanding, and default option starting.val = "res" usually gives reasonably good results.

Models are implemented using TMB (Kristensen et al., 2015) applied to variational approximation (Hui et al., 2017) and Laplace approximation (Niku et al., 2017).

An exception is ordinal family which is not implemented with TMB and therefore also row.eff = "random" does not work. With ordinal family response classes must start from 0 or 1.

Distributions

Mean and variance for distributions are defined as follows.

For count data family = "poisson": Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}$, or
family = "negative.binomial": Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}+\phi_j*\mu_{ij}^2$, or
family = "ZIP": Expectation $E[Y_{ij}] = (1-p)\mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}(1-p)(1+\mu_{ij}p)$.
For binary data family = "binomial": Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \mu_{ij}(1-\mu_{ij})$.
For biomass data family = "tweedie": Expectation $E[Y_{ij}] = \mu_{ij}$, variance $V(\mu_{ij}) = \phi_j*\mu_{ij}^\nu$, where $\nu$ is a power parameter of Tweedie distribution. See details Dunn and Smyth (2005).
For ordinal data family = "ordinal": See Hui et.al. (2016).

References

Brown, A. M., Warton, D. I., Andrew, N. R., Binns, M., Cassis, G., and Gibb, H. (2014). The fourth-corner solution - using predictive models to understand how species traits interact with the environment. Methods in Ecology and Evolution, 5:344-352.

Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of tweedie exponential dispersion model densities. Statistics and Computing, 15:267-280.

Hui, F. K. C., Taskinen, S., Pledger, S., Foster, S. D., and Warton, D. I. (2015). Model-based approaches to unconstrained ordination. Methods in Ecology and Evolution, 6:399-411.

Hui, F. K. C., Warton, D., Ormerod, J., Haapaniemi, V., and Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics. Journal of Computational and Graphical Statistics, 26:35-43.

Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21.

Niku, J., Warton, D. I., Hui, F. K. C., and Taskinen, S. (2017). Generalized linear latent variable models for multivariate count and biomass data in ecology. Journal of Agricultural, Biological, and Environmental Statistics, 22:498-522

Warton, D. I., Guillaume Blanchet, F., O'Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C. and Hui, F. K. C. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology & Evolution, 30:766-779.

Examples

Run this code

# NOT RUN {
## Load a dataset from the mvabund package
data(antTraits)
y <- as.matrix(antTraits$abund)
X <- as.matrix(antTraits$env)
TR <- antTraits$traits
# Fit model with environmental covariates Bare.ground and Shrub.cover
fit <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
            family = "poisson")
ordiplot(fit)
coefplot(fit)

# }
# NOT RUN {
## Example 1: Fit model with two latent variables
# Using variational approximation:
fitv0 <- gllvm(y, family = "negative.binomial", method = "VA")
ordiplot(fitv0)
plot(fitv0, mfrow = c(2,2))
summary(fitv0)
confint(fitv0)
# Using Laplace approximation: (this line may take about 30 sec to run)
fitl0 <- gllvm(y, family = "negative.binomial", method = "LA")
ordiplot(fitl0)

# Poisson family:
fit.p <- gllvm(y, family = "poisson", method = "LA")
ordiplot(fit.p)
# Use poisson model as a starting parameters for ZIP-model, this line may take few minutes to run
fit.z <- gllvm(y, family = "ZIP", method = "LA", start.fit = fit.p)
ordiplot(fit.z)


## Example 2: gllvm with environmental variables
# Fit model with two latent variables and all environmental covariates,
fitvX <- gllvm(formula = y ~ X, family = "negative.binomial")
ordiplot(fitvX, biplot = TRUE)
coefplot(fitvX)
# Fit model with environmental covariates Bare.ground and Shrub.cover
fitvX2 <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
 family = "negative.binomial")
ordiplot(fitvX2)
coefplot(fitvX2)
# Use 5 initial runs and pick the best one
fitvX_5 <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
 family = "negative.binomial", n.init = 5, jitter.var = 0.1)
ordiplot(fitvX_5)
coefplot(fitvX_5)

## Example 3: Data in long format
# Reshape data to long format:
datalong <- reshape(data.frame(cbind(y,X)), direction = "long",
                   varying = colnames(y), v.names = "y")
head(datalong)
fitvLong <- gllvm(data = datalong, formula = y ~ Bare.ground + Shrub.cover,
               family = "negative.binomial")

## Example 4: Fourth corner model
# Fit fourth corner model with two latent variables
fitF1 <- gllvm(y = y, X = X, TR = TR, family = "negative.binomial")
coefplot(fitF1)
# Specify model using formula
fitF2 <- gllvm(y = y, X = X, TR = TR,
 formula = ~ Bare.ground + Canopy.cover*(Pilosity + Webers.length),
 family = "negative.binomial")
ordiplot(fitF2)
coefplot(fitF2)

## Example 5: Fit Tweedie model
# Load coral data
data(tikus)
ycoral <- tikus$abund
# Let's consider only years 1981 and 1983
ycoral <- ycoral[((tikus$x$time == 81) + (tikus$x$time == 83)) > 0,]
# Exclude species which have observed at less than 4 sites
ycoral <- ycoral[(colSums(ycoral > 0) > 3)]
# Fit Tweedie model for coral data (this line may take few minutes to run)
fit.twe <- gllvm(y = ycoral, family = "tweedie", method = "LA")
ordiplot(fit.twe)

## Example 6: Random row effects
fitRand <- gllvm(y, family = "negative.binomial", row.eff = "random")
ordiplot(fitRand,biplot=TRUE)
# }

Run the code above in your browser using DataLab