linreg.stan: linreg.stan

Description

Fits one of three models of linear regression to a dataset in a Bayesian framework. Bayesian sampling is conducted in the Stan software via the 'rstan' package. Users supply vectors of observations and, unless an "ols" model is chosen, a covariance matrix. Users can alter parameters for model-parameter prior distributions and Bayesian sampling settings. See details.

Usage

linreg.stan(des, y, model="ols", covmat=NULL, iter=2000, 
  chains=4, cores=4, coef.u=0, coef.sd=10, physig2.u=-1, physig2.sd=1, 
  randsig2.loc=0, randsig2.sc=2.5, ...)

Value

A list containing two elements: (1) the posterior distribution of model parameters, and (2) the log-likelihood of the posteriors for use in downstream analyses (e.g. the calculation of model fitting metrics like loo or waic)

Arguments

des: A vector of predictor variable observations OR, in the case of multiple predictors, a matrix in which each column is a vector of observations of a given predictor. Function will add a column of 1s to make this a design matrix whose first column corresponds to the model intercept (unless such a column already exists).
y: A vector of response variable observations.
model: Choice of "ols", "pgls", or "pgls.mm"; defaults to ols.
covmat: Covariance matrix for model residuals (a lineage-pair covariance matrix if analyzing lineage-pair data or a phylogenetic vcv matrix if analyzing species data).
iter: Number of iterations to run on each chain; defaults to 2000 (more are often necessary).
chains: Number of Markov chains to run; defaults to 4.
cores: Number of cores to use.
coef.u: Mean of the Gaussian prior for each preditor variable coefficient; defaults to 0.
coef.sd: SD of the Gaussian prior for each preditor variable coefficient; defaults to 10.
physig2.u: Mean of the prior distribution (lognormal) for the scale of the phylogenetic component of residual covariance; defaults to -1.
physig2.sd: SD of the prior distribution (lognormal) for the scale of the phylogenetic component of residual covariance; defaults to 1.
randsig2.loc: Location parameter of the prior distribution (cauchy) for the scale of the independent component of residual covariance ; defaults to 0.
randsig2.sc: Scale parameter of the prior distribution (cauchy) for the scale of the independent component of residual covariance ; defaults to 2.5.
...: additional arguments passed to rstan::sampling(), including control parmeters (see rstan::sampling() documentation)

Details

The models that can be chosen are:

"ols": basic ordinary least-squares regression.

Y = XB + epsilon; where epsilon~N(0,randsig2*I) and randsig2 is the scaling parameter for identity matrix I.
"pgls": phylogenetic generalized least-squares regression in which residuals covary according to covariance matrix C. This is a lineage-pair covariance matrix (if analyzing lineage-pair data) or a phylogenetic vcv matrix (if analyzing species data).

Y = XB + epsilon; where epsilon~N(0,physig2*C) and physig2 is the scaling parameter for covariance matrix C.
"pgls.mm" a mixed-effects version of pgls in which the residuals have both uncorrelated and correlated components, where the latter are structured according to covariance matrix C.

Y = XB + epsilon + u; where epsilon~N(0,randsig2*I) and u~N(0,physig2*C).

NOTE: For models 2 and 3, the user must supply a covariance matrix.

Prior Distributions for Regression Model Parameters: The underlying stan models assume the following prior distributions for model parameters

Regression coefficients: Gaussian prior (users can set prior mean and sd).
physig2: lognormal prior (users can set prior mean and sd).
randsig2: cauchy prior (users can set location and scale parameters of prior).

References

Anderson, S. A. S., et al. In review. The comparative analysis of lineage-pair data.

Examples

Run this code

# \donttest{
## Fit regression models with and without covariance matrix
# Note: data were simulated with Coef[1]=1 (intercept), Coef[2]=0.8 (slope)
# Load a data simulated with a non-independent response observations
data(data3)
# Also load the lineage-pair covariance matrix that arose from those simulations
data(sim.cov.pairs)
# Fit an OLS model
result1 = linreg.stan(des=data3[,3], y=data3[,4], cores=2)
# Fit an pgls model
result2 = linreg.stan(des=data3[,3], y=data3[,4], model="pgls", covmat=sim.cov.pairs, cores=2)

# Compare posterior parameter estimates
result1[[1]]
result2[[1]]

# Compare the fit of the two models via loo and waic
loo1 = loo::loo(result1[[2]])
loo2 = loo::loo(result2[[2]])
waic1 = loo::waic(result1[[2]])
waic2 = loo::waic(result2[[2]])
loo1
loo2
waic1
waic2
loo::loo_compare(loo1, loo2)
loo::loo_compare(waic1, waic2)

# Extend the comparison by fitting a pgls.mm model
result3 = linreg.stan(des=data3[,3], y=data3[,4], model="pgls.mm", 
 covmat=sim.cov.pairs, cores=2)

# Compare posterior parameter estimates
result1[[1]]
result2[[1]]
result3[[1]]

# Compare the fit of the three models via loo and waic
loo3 = loo::loo(result3[[2]])
waic3 = loo::waic(result3[[2]])
loo::loo_compare(loo1, loo2, loo3)
loo::loo_compare(waic1, waic2, waic3)
# }

Run the code above in your browser using DataLab