bcpglm: Bayesian Compound Poisson Generalized Linear Model

Description

This function handles Bayesian Tweedie compound Poisson generalized linear models and generates posterior simulations using Markov Chain Monte Carlo methods.

Usage

bcpglm(formula, link = "log", data, inits = NULL, 
  weights, offset, subset, na.action, contrasts = NULL, 
  n.chains = 3, n.iter = 2000, n.burnin = floor(n.iter/2), 
  n.thin = max(1, floor(n.chains * (n.iter - n.burnin)/n.sims)), 
  n.sims = 1000, n.report = 1000, prior.beta.mean, prior.beta.var, 
  bound.phi=100, bound.p = c(1.01, 1.99),method="dtweedie", 
  tune.iter=4000, n.tune=10, tune.weight=0.25,...)

Arguments

formula

an object of class formula. See also in glm.

link

a specification for the model link function. This can be either a literal character string or a numeric number. If it is a character string, it must be one of "log", "identity", "sqrt" or "inverse". If it is numeric, it is the same as the link.power

inits

a list of initial values to be used for each chain. It must be of length n.chains. For each element, it is a named list with three components 'beta', 'phi' and 'p'. If not supplied, the function will generate initial values automatically, whi

weights

an optional vector of weights. Should be NULL or a numeric vector. When it is numeric, it must be positive. Zero weights are not allowed in bcpglm.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the for

data, subset, na.action, contrasts

same as those in glm and cpglm.

n.chains

an integer indicating the number of Markov chains (default: 3).

n.iter

number of total iterations per chain (including burn in; default: 2000)

n.burnin

length of burn in, i.e. number of iterations to discard at the beginning. Default is n.iter/2, that is, discarding the first half of the simulations.

n.thin

thinning rate. Must be a positive integer. Set n.thin > 1 to save memory and computation time if n.iter is large. Default is max(1, floor(n.chains*(n.iter-n.burnin) / 1000)) which will only thin if there are at least 2000 simulations.

n.sims

The approximate number of simulations to keep after thinning.

n.report

if greater than zero, fitting information will be printed out after each n.report iterations.

prior.beta.mean

a vector of prior means for the model coefficients. Default is a vector of zeros.

prior.beta.var

a vector of prior variance for the model coefficients. Default is a vector of 10000's.

bound.phi

upper bound of the uniform prior for the dispersion parameter phi. The default is 100.

bound.p

a vector of lower and upper bound for the index parameter $p$. The default is c(1.01,1.99).

method

determines how the MCMC is implemented. If method="dtweedie" (the default), then full conditionals are computed using numerical methods to approximate the tweedie density. If method="latent", then the latent Poisson variables are

tune.iter

number of iterations used for tuning the variance for the Normal proposal distribution. These iterations will not be used in the final output. Default is 4000, and set it to be zero if the tuning process is not desired.

n.tune

if positive, the tune.iter iterations will be divided into n.tune loops. Default is 10.

tune.weight

the weight to be given to the old covariance matrix as opposed to the covariance matrix estimated from simulated samples. Default is 0.25.

...

not used.

Value

bcpglm returns an object of class "bcpglm". See bcpglm-class for details of the return values as well as various methods available for this class.

eqn

$\phi$

code

coda

Details

The function fits the Tweedie compound Poisson generalized linear model within the Bayesian framework, where posterior distributions of model parameters are estimated using Monte Carlo Markov Chains [MCMC]. Two methods are provided here: one relies on the capability to approximate the Tweedie density (i.e., the normalizing constant involving $\phi$ and $p$), the other relies on the use of latent Poisson variables to avoid direct evaluation of the Tweedie density (method="latent"). The first method is more general while the second method is much faster when it works. Since this is a Bayesian model, prior distributions have to be specified for all parameters in the model. Here, Normal distributions are used for the model coefficients ($\beta$), a Uniform distribution for the dispersion parameter ($\phi$) and a Uniform distribution for the index parameter ($p$). Prior means and variances for the model coefficients can be supplied using the argument prior.beta.mean and prior.beta.var, respectively. The prior distribution for $\phi$ is uniform on (0, bound.phi). And the bounds of the Uniform for $p$ can be specified in the argument bound.p. See details in Section 'Arguments'. In implementing the MCMC, a Gibbs sampler is constructed as follows:

Ifmethod="latent", simulation of the latent Poisson variables relies on rejection sampling, where zero-truncated Poisson distributions are used as proposals.

Model coefficients are updated block-wise using the Metropolis-Hastings algorithm, where a random-walk multivariate Normal proposal distribution is used. The variance-covariance matrix used in the multivariate Normal is the variance-covariance matrix estimated from a preliminary cpglm fit with the specified model structure. Updates of the index parameter and the dispersion parameter rely on univariate Metropolis-Hastings, where a truncated Normal distribution is used as a proposal.

Examples

Run this code

# fit the fineroot data with Bayesian models
# first use the latent variable approach
set.seed(10)
fit1 <- bcpglm(RLD~ factor(Zone)*factor(Stock), data=fineroot,
               n.iter=11000, n.burnin=1000,
               n.thin=10, n.report=5000, method="latent")
gelman.diag(fit1$sims.list)
# diagnostic plots                             
acfplot(fit1$sims.list,lag.max=20)
xyplot(fit1$sims.list)                              
densityplot(fit1$sims.list)               

# summary results
summary(fit1)                             

# now try direct density evaluation (This is much slower)
set.seed(10)
fit2 <- bcpglm(RLD~ factor(Zone)*factor(Stock), data=fineroot,
               n.iter=11000, n.burnin=1000,
               n.thin=10, n.report=5000)
gelman.diag(fit2$sims.list)
summary(fit2)                             


# now fit the Bayesian model to an insurance loss triangle 
# (see Peters et al. 2009)
fit3 <- bcpglm(increLoss~ factor(year)+factor(lag), data=insLoss,
               tune.iter=5000, n.tune = 10,
               n.iter=11000, n.burnin=1000,
               n.thin=10, n.report=5000, bound.p=c(1.1,1.95))
gelman.diag(fit3$sims.list)
summary(fit3)

Run the code above in your browser using DataLab