calc.marglogLik: Marginal log-likelihood for an boral model

Description

Calculates the marginal log-likelihood for a set of parameter estimates from an boral model, whereby the latent variables are integrated out. The integration is performed using Monte Carlo integration.

Usage

calc.marglogLik(y, X = NULL, family, trial.size = NULL, lv.coefs, 
     X.coefs = NULL, site.coefs = NULL, num.lv, 
     X.mc = NULL, cutoffs = NULL, powerparam = NULL)

Arguments

The response matrix that the boral model was fitted to.

The model matrix used in the boral model. Defaults to NULL, in which case it is assumed no model matrix was used.

family

Either a single element, or a vector of length equal to the number of columns in $y$. The former assumes all columns of $y$ come from this distribution. The latter option allows for different distributions for each column of $y$. Elements can be one of "b

trial.size

Either equal to NULL, a single element, or a vector of length equal to the number of columns in $y$. If a single element, then all columns assumed to be binomially distributed will have trial size set to this. If a vector, different trial sizes are allowe

lv.coefs

The column-specific intercept, coefficient estimates relating to the latent variables, and dispersion parameters from the boral model.

X.coefs

The coefficients estimates relating to the model matrix X from the boral model. Defaults to NULL, in which it is assumed there are no covariates in the model.

site.coefs

Row effects estimates for the boral model. Defaults to NULL, in which case it is assumed there are no row effects in the model.

num.lv

The number of latent variables used in the boral model. For boral models with no latent variables, please use calc.logLik.lv0 to calculate the marginal log-likelihood.

X.mc

A matrix used for performing the Monte Carlo integration. Defaults to NULL, in which case a matrix is generated within the function.

cutoffs

Common cutoff estimates from the boral model when any of the columns of $y$ are ordinal responses. Defaults to NULL.

powerparam

Common power parameter from the boral model when any of the columns of $y$ are tweedie responses. Defaults to NULL.

Value

A list with the following components:
logLikValue of the marginal log-likelihood.
logLik.compA vector of the log-likelihood values for each row of $y$, such that sum(logLik.comp) = logLik.

Details

For an $n x p$ response matrix $y$, suppose we fit an boral model with $q$ latent varables. If we denote the latent varibles by $b_i; i = 1,\ldots,n$, then the marginal log-likelihood is given by (with parameters where appropriate)

$$\log(f) = \sum_{i=1}^n \log ( \int \prod_{j=1}^p f(y_{ij} | \alpha_i, \tau_k, \theta_{0j}, \bm{\theta}_j, \bm{z}_i, \bm{x}_i, \bm{\beta}_j, \phi_j) f(\bm{z}_i) d\bm{z}_i),$$

where $f(y_{ij}|\cdot)$ is the assumed distribution for column $j$, $\alpha_i$ is the row effect, $\tau_k$ are the cutoffs for proportional odds regression, $\theta_{0j}$ is the column-specific intercepts, $\bm{z}'_i$ are the latent variables, $\bm{\theta}_j$ are the column-specific coefficients relating to the latent variables, $\bm{x}'_i$ is row $i$ of the model matrix, $\bm{\beta}_j$ are the column-specific coefficients relating to the model matrix of covariates, $\phi_j$ are column-specific dispersion parameters.

The quantity $f(\bm{z}_i)$ denotes the distribution of the latent variable, which is assumed to be standard multivariate Gaussian for boral models.

The key difference between this and the conditional likelihood (see calc.condlogLik) is that the marginal log-likelihood treats the latent variables as "random effects" and integrates over them, whereas the conditional log-likelihood treats the latent variables as "fixed effects".

Monte Carlo integration is used for calculating the marginal log-likelihood. If X.mc = NULL, the function automatically generates a matrix as X.mc <- cbind(1, rmvnorm(5000, rep(0,num.lv))). If there is need to apply this function numerous times, we recommend a matrix be inserted into X.mc to speed up computation.

Examples

Run this code

library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
n <- nrow(y); p <- ncol(y); 
    
## Example 1 - model with 2 latent variables, site effects, 
## 	and no environmental covariates
spider.fit.nb <- boral(y, family = "negative.binomial", num.lv = 2, 
     site.eff = TRUE, save.model = TRUE, calc.ics = FALSE)

## Extract all MCMC samples
fit.mcmc <- as.mcmc(spider.fit.nb$jags.model)[[1]]

## Find the posterior medians
coef.mat <- matrix(apply(fit.mcmc[,grep("all.params",colnames(fit.mcmc))],
     2,median),nrow=p)
site.coef.median <- apply(fit.mcmc[,grep("site.params", colnames(fit.mcmc))],
     2,median)
     
## Caculate the marginal log-likelihood at the posterior median
calc.marglogLik(y, family = "negative.binomial",
	lv.coefs = coef.mat, site.coefs = site.coef.median, num.lv = 2, X.mc = NULL)

	
## Example 2 - model with one latent variable, no site effects, 
## 	and environmental covariates
spider.fit.nb2 <- boral(y, X = spider$x, family = "negative.binomial", 
     num.lv = 1, site.eff = FALSE, save.model = TRUE, calc.ics = FALSE)

## Extract all MCMC samples
fit.mcmc <- as.mcmc(spider.fit.nb2$jags.model)[[1]]

## Find the posterior medians
coef.mat <- matrix(apply(fit.mcmc[,grep("all.params",colnames(fit.mcmc))],
     2,median),nrow=p)
X.coef.mat <- matrix(apply(fit.mcmc[,grep("X.params",colnames(fit.mcmc))],
	2,median),nrow=p)

## Caculate the log-likelihood at the posterior median
calc.marglogLik(y, X = spider$x, family = "negative.binomial", 
	lv.coefs = coef.mat, X.coefs = X.coef.mat, num.lv = 1)

Run the code above in your browser using DataLab