deforestation: Binary logistic regression with variable time-intervals between observations for modelling deforestation

Description

The deforestation() function estimates the parameters of a Binary logistic regression model with variable time-intervals between observations in a hierarchical Bayesian framework. To estimate the posterior distribution of the parameters, an adaptive Metropolis algorithm is used. The user supplies data and priors and a sample from the posterior distribution is returned as an MCMC object, which can be subsequently analyzed with functions provided in the coda package.

Usage

deforestation(formula, interval=1, data, burnin=1000, mcmc=1000, thin=1, verbose=1, seed=NA, tune=1, beta.start=NA, mubeta=0, Vbeta=1.0E6)

Arguments

formula

A two-sided linear formula of the form 'y~x1+...+xp' describing the model, with the response on the left of a '~' operator and the p fixed terms, separated by '+' operators, on the right. Response variable y must be 0 or 1 (Bernoulli trials).

interval

A numeric scalar or a vector of length equal to the number of observations. interval specifies the time interval between land cover observations. Default to 1.

data

A data frame containing the variables of the model.

burnin

The number of burnin iterations for the sampler.

mcmc

The number of Gibbs iterations for the sampler. Total number of Gibbs iterations is equal to burnin+mcmc. burnin+mcmc must be divisible by 10 and superior or equal to 100 so that the progress bar can be displayed.

thin

The thinning interval used in the simulation. The number of Gibbs iterations must be divisible by this value.

verbose

A switch (0,1) which determines whether or not the progress of the sampler is printed to the screen. Default is 1: a progress bar is printed, indicating the step (in %) reached by the Gibbs sampler.

seed

The seed for the random number generator. If NA, the Mersenne twister generator is used with default seed 12345; if an integer is passed it is used to seed the Mersenne twister.

tune

Metropolis tuning parameter. Must be a positive scalar. The tuning parameter is updated during the burning period to approach an acceptance rate of 0.44.

beta.start

The starting values for the $beta$ vector. This can either be a scalar or a p-length vector. The default value of NA will use the OLS $beta$ estimate of the corresponding Binary logistic regression. If this is a scalar, that value will serve as the starting value for all of the betas.

mubeta

The prior mean of $beta$. This can either be a scalar or a p-length vector. If this takes a scalar value, then that value will serve as the prior mean for all of the betas. The default value of 0 will use a vector of zeros for an uninformative prior.

Vbeta

The prior variance of $beta$. This can either be a scalar or a square p-dimension matrix. If this takes a scalar value, then that value times an identity matrix serves as the prior variance of beta. Default value of 1.0E6 will use a diagonal matrix with very large variance for an uninformative flat prior.

Value

mcmc: An MCMC object that contains the posterior sample. This object can be summarized by functions provided by the coda package.
deviance: The posterior mean of the deviance $D$, with $D=-2log(prod_i P(y_i|theta_i'))$.
tune: The optimized value for the tuning parameter. This value can be used for potential future runs.

Details

The deforestation() function estimates the parameters of a logistic regression model with variable time-intervals between observations. The estimation is done in a hierarchical Bayesian framework using an adaptive Metropolis algorithm. Function is developped in C++ code using the Scythe statistical library (Pemstein et al. 2007) to maximize efficiency. The model takes the following form: $$y_i \sim \mathcal{B}ernoulli(\theta'_i)$$

With $theta_i' = 1-(1-theta_i)^I_i$, where $I_i$ stands for the time-interval for observation $i$. Thus, $theta_i$ is a probability by unit of time (an annual rate for example if the unit of the time-interval is in year).

Using the logit link function denoted $phi$, we set: $$\phi(\theta_i) = X_i \beta$$ By default, we assume a multivariate Normal prior on $beta$: $$\beta \sim \mathcal{N}(\mu_{\beta},V_{\beta})$$

For the Metropolis algorithm, the proposal distribution is centered at the current value of $beta$ and has variance-covariance $V = T (V_beta^(-1) + C^(-1))^(-1) T$, where $T$ is the diagonal positive definite matrix formed from the tune, $V_beta$ is the prior variance, and $C$ is the large sample variance-covariance matrix of the MLEs without considering the variation of the time-interval between observations. This last calculation is done via an initial call to glm.

References

Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.wustl.edu. Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). http://www-fis.iarc.fr/coda/.

Ghislain Vieilledent, Clovis Grinand and Romuald Vaudry. 2013. Forecasting deforestation and carbon emissions in tropical developing countries facing demographic expansion: a case study in Madagascar. Ecology and Evolution. DOI: 10.1002/ece3.550

Examples

Run this code


## Not run: 
# #=====================================================================
# # Logistic regression with variable time-interval between observations 
# #=====================================================================
# 
# #= Library
# library(phcfM)
# 
# #== Generating data
# 
# # Random seed
# set.seed(1234)
# 
# # Constants
# nobs <- 3000
# 
# # Covariates
# X1 <- runif(n=nobs,min=-10,max=10)
# X2 <- runif(n=nobs,min=-10,max=10)
# X <- cbind(rep(1,nobs),X1,X2)
# I <- runif(n=nobs,min=1,max=5) # Time-interval
# 
# # Target beta parameters
# beta.target <- matrix(c(0.3,0.2,0.1),ncol=1)
# 
# # Response
# theta <- vector()
# theta_prim <- vector()
# Y <- vector()
# for (n in 1:nobs) {
#   theta[n] <- inv.logit(X[n,]%*%beta.target)
#   theta_prim[n] <- 1-(1-theta[n])^I[n]
#   Y[n] <- rbinom(n=1,size=1,prob=theta_prim[n])
# }
# 
# # Data-set
# Data <- as.data.frame(cbind(Y,I,theta_prim,theta,X1,X2))
# plot(Data$X1,Data$theta)
# plot(Data$X2,Data$theta)
# 
# #== Call to deforestation()
# model <- deforestation(formula=Y~X1+X2, interval=Data$I, data=Data, burnin=1000, mcmc=1000,
#                        thin=1, verbose=1, seed=NA, tune=1, beta.start=NA, mubeta=0,
#                        Vbeta=1.0E6)
# 
# #== MCMC analysis
# 
# # Graphics
# plot(model$mcmc)
# 
# # Parameter estimates
# str(model)
# summary(model$mcmc)
# model$deviance
# model$tune
# 
# ## We obtain good parameter estimates 
# ##
# ##                    Mean       SD  Naive SE Time-series SE
# ## beta.(Intercept) 0.3362 0.054534 0.0017245       0.006589
# ## beta.X1          0.2027 0.009345 0.0002955       0.001154
# ## beta.X2          0.1037 0.007269 0.0002299       0.000838
# 
# #== GLM resolution if time-interval is not taken into account
# 
# model.glm <- glm(Y~X1+X2,data=Data,family="binomial")
# summary(model.glm)
# 
# ## In this case, the parameter estimates are biased
# ##             Estimate Std. Error z value Pr(>|z|)    
# ## (Intercept) 2.000761   0.072069   27.76   <2e-16 ***
# ## X1          0.247514   0.012060   20.52   <2e-16 ***
# ## X2          0.123137   0.009803   12.56   <2e-16 ***
# ## End(Not run)

Run the code above in your browser using DataLab