BLR: Bayesian Linear Regression

Description

The BLR (`Bayesian Linear Regression') function was designed to fit parametric regression models using different types of shrinkage methods. An earlier version of this program was presented in de los Campos et al. (2009).

Usage

BLR(y, XF, XR, XL, GF, prior, nIter, burnIn, thin,thin2,saveAt, minAbsBeta,weights)

Arguments

(numeric, $n$) the data-vector (NAs allowed).

(numeric, $n x pF$) incidence matrix for $bF$, may be NULL.

(numeric, $n x pR$) incidence matrix for $bR$, may be NULL.

(numeric, $n x pL$) incidence matrix for $bL$, may be NULL.

(list) providing an $$$ID (integer, $n$) linking observations to groups (e.g., lines or sires) and a (co)variance structure ($$$A, numeric, $pU x pU$) between effects of the grouping factor (e.g., line or sire effects). Note: ID must be an integer taking values from 1 to $pU$; ID[i]=$q$ indicates that the ith observation in $y$ belongs to cluster $q$ whose (co)variance function is in the qth row (column) of $A$. GF may be NULL.

weights

(numeric, $n$) a vector of weights, may be NULL.

nIter,burnIn, thin

(integer) the number of iterations, burn-in and thinning.

saveAt

(string) this may include a path and a pre-fix that will be added to the name of the files that are saved as the program runs.

prior

(list) containing the following elements,

prior$varE, prior$varBR, prior$varU: (list) each providing degree of freedom ($df) and scale ($S). These are the parameters of the scaled inverse-$\chi^2$ distributions assigned to variance components, see Eq. (2) below. In the parameterization used by BLR() the prior expectation of variance parameters is $S/(df-2)$.
prior$lambda: (list) providing $value (initial value for $lambda$); $type (`random' or `fixed') this argument specifies whether $lambda$ should be kept fixed at the value provided by $value or updated with samples from the posterior distribution; and, either $shape and $rate (this when a Gamma prior is desired on $lambda^2$) or $shape1, $shape2 and $max, in this case $p(lambda|max,alpha1,alpha2) = Beta(lamda/max |alpha1,alpha2)$. For detailed description of these priors see de los Campos et al. (2009).

thin2

This value controls wether the running means are saved to disk or not. If thin2 is greater than nIter the running means are not saved (default, thin2=$1e10$).

minAbsBeta

The minimum absolute value of the components of $bL$ to avoid numeric problems when sampling from $\boldsymbol \tau^2$, default $1e-9$

Value

$yHat

the posterior mean of $1*mu + XF*bF + XR*bR + XL*bL + Z*u$.

$SD.yHat

the corresponding posterior standard deviation.

$mu

the posterior mean of the intercept.

$varE

the posterior mean of $varE$.

$bR

the posterior mean of $bR$.

$SD.bR

the corresponding posterior standard deviation.

$varBr

the posterior mean of $varBr$.

$bL

the posterior mean of $bL$.

$SD.bL

the corresponding posterior standard deviation.

$tau2

the posterior mean of $tau^2$.

$lambda

the posterior mean of $lambda$.

$u

the posterior mean of $u$.

$SD.u

the corresponding posterior standard deviation.

$varU

the posterior mean of $varU$.

$fit

a list with evaluations of effective number of parameters and DIC (Spiegelhalter et al., 2002).

$whichNa

a vector indicating which entries in $\boldsymbol y$ were missing.

$prior

a list containig the priors used during the analysis.

$weights

vector of weights.

$fit

list containing the following elements,

$logLikAtPostMean: log-likelihood evaluated at posterior mean.
$postMeanLogLik: the posterior mean of the Log-Likelihood.
$pD: estimated effective number of parameters, Spiegelhalter et al. (2002).
$DIC: the deviance information criterion, Spiegelhalter et al. (2002).

$nIter

the number of iterations made in the Gibbs sampler.

$burnIn

the nuber of iteratios used as burn-in.

$thin

the thin used.

$y

original data-vector.

Save

Details

The program runs a Gibbs sampler for the Bayesian regression model described below.

Likelihood. The equation for the data is:

$$ \begin{array}{lr} \boldsymbol y= \boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu} + \boldsymbol \varepsilon & (1) \end{array} $$

where $y$, the response is a $n x 1$ vector (NAs allowed); $mu$ is an intercept; $XF, XR, XL$ and $Z$ are incidence matrices used to accommodate different types of effects (see below), and; $e$ is a vector of model residuals assumed to be distributed as $e ~ MVN(0,Diag(varE/wi^2))$, here $varE$ is an (unknown) variance parameter and $w_i$ are (known) weights that allow for heterogeneous-residual variances.

Any of the elements in the right-hand side of the linear predictor, except $mu$ and $e$ , can be omitted; by default the program runs an intercept model.

Prior. The residual variance is assigned a scaled inverse-$\chi^2$ prior with degree of freedom and scale parameter provided by the user, that is, $varE ~ Inv_Scaled_chisq(varE | dfe,Se)$. The regression coefficients ${mu,bF,bR,bL,u}$ are assigned priors that yield different type of shrinkage. The intercept and the vector of regression coefficients $bF$ are assigned flat priors (i.e., estimates are not shrunk). The vector of regression coefficients $bR$ is assigned a Gaussian prior with variance common to all effects, that is, $bRj ~ NIID(0,varBr)$. This prior is the Bayesian counterpart of Ridge Regression. The variance parameter $varBr$, is treated as unknown and it is assigned a scaled inverse-$\chi^2$ prior, that is, $varBr ~ Inv_Scaled_chisq(varBr | dfBr,SBr)$ with degrees of freedom $dfBr$, and scale $SBr$ provided by the user.

The vector of regression coefficients $bL$ is treated as in the Bayesian LASSO of Park and Casella (2008). Specifically,

$$p(\boldsymbol \beta_L, \boldsymbol \tau^2, \lambda | \sigma_{\boldsymbol \varepsilon}^2) = \left\{\prod_k N(\beta_{L,k} | 0, \sigma_{\boldsymbol \varepsilon}^2 \tau_k^2) Exp\left(\tau_k^2 | \lambda^2\right) \right\} p(\lambda),$$

where, $Exp(.|.)$ is an exponential prior and $p(lambda)$ can either be: (a) a mass-point at some value (i.e., fixed $lambda$); (b) $p(lambda^2)~Gamma(r,delta)$ this is the prior suggested by Park and Casella (2008); or, (c) $p(lambda|max,alpha1,alpha2) = Beta(lamda/max |alpha1,alpha2)$, see de los Campos et al. (2009) for details. It can be shown that the marginal prior of regression coefficients $bL_k, Integrate(N(bL_k|0,varE * tau_k^2) * Exp(tau_k^2 | lambda^2) d tau_k^2)$, is Double-Exponential. This prior has thicker tails and higher peak of mass at zero than the Gaussian prior used for $bR$, inducing a different type of shrinkage.

The vector $u$ is used to model the so called `infinitesimal effects', and is assigned a prior $u~MVN(0,varU)$, where, $A$ is a positive-definite matrix (usually a relationship matrix computed from a pedigree) and $varU$ is an unknow variance, whose prior is $varU ~ Inv_Scaled_chisq(varU | dfu,Su)$.

Collecting the above mentioned assumptions, the posterior distribution of model unknowns, $theta={mu, bF, bR, varBr, bL, tau^2, lambda, u, varU, varE}$, is,

$$\begin{array}{lclr} p(\boldsymbol \theta | \boldsymbol y) & \propto & N\left( \boldsymbol y | \boldsymbol 1 \mu + \boldsymbol X_F \boldsymbol \beta_F + \boldsymbol X_R \boldsymbol \beta_R + \boldsymbol X_L \boldsymbol \beta_L + \boldsymbol{Zu}; Diag\left\{ \frac{\sigma_\varepsilon^2}{w_i^2}\right\}\right) & \\ & & \times \left\{ \prod\limits_j N\left(\beta_{R,j} | 0, \sigma_{\boldsymbol \beta_R}^2 \right) \right\} \chi^{-2} \left(\sigma_{\boldsymbol \beta_R}^2 | df_{\boldsymbol \beta_R}, S_{\boldsymbol \beta_R}\right) & \\ & & \times \left\{ \prod\limits_k N \left( \beta_{L,k} |0,\sigma_{\boldsymbol \varepsilon}^2 \tau_k^2 \right) Exp \left(\tau_k^2 | \lambda^2\right)\right\} p(\lambda) & (2)\\ & & \times N(\boldsymbol u | \boldsymbol 0,\boldsymbol A\sigma_{\boldsymbol u}^2) \chi^{-2} (\sigma_{\boldsymbol u}^2 | df_{\boldsymbol u}, S_{\boldsymbol u}) \chi^{-2} (\sigma_{\boldsymbol \varepsilon}^2 | df_{\boldsymbol \varepsilon}, S_{\boldsymbol \varepsilon}) & \end{array} $$

References

de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. Cotes. 2009. Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree. Genetics 182: 375-385.

Park T. and G. Casella. 2008. The Bayesian LASSO. Journal of the American Statistical Association 103: 681-686.

Spiegelhalter, D.J., N.G. Best, B.P. Carlin and A. van der Linde. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B (Statistical Methodology) 64 (4): 583-639.

Examples

Run this code

## Not run: 
# ########################################################################
# ##Example 1:
# ########################################################################
# 
# rm(list=ls())
# setwd(tempdir())
# library(BGLR)
# data(wheat)     #Loads the wheat dataset
# 
# y=wheat.Y[,1]
# ### Creates a testing set with 100 observations
# whichNa<-sample(1:length(y),size=100,replace=FALSE)
# yNa<-y
# yNa[whichNa]<-NA
# 
# ### Runs the Gibbs sampler
# fm<-BLR(y=yNa,XL=wheat.X,GF=list(ID=1:nrow(wheat.A),A=wheat.A),
#                            prior=list(varE=list(df=3,S=0.25),
#                            varU=list(df=3,S=0.63),
#                            lambda=list(shape=0.52,rate=1e-4,
#                            type='random',value=30)),
#                            nIter=5500,burnIn=500,thin=1)
# 
# MSE.tst<-mean((fm$yHat[whichNa]-y[whichNa])^2)
# MSE.tst
# MSE.trn<-mean((fm$yHat[-whichNa]-y[-whichNa])^2)
# MSE.trn
# COR.tst<-cor(fm$yHat[whichNa],y[whichNa])
# COR.tst
# COR.trn<-cor(fm$yHat[-whichNa],y[-whichNa])
# COR.trn
# 
# plot(fm$yHat~y,xlab="Phenotype",
#      ylab="Pred. Gen. Value" ,cex=.8)
# points(x=y[whichNa],y=fm$yHat[whichNa],col=2,cex=.8,pch=19)
# 
# x11()
# plot(scan('varE.dat'),type="o",
#         ylab=expression(paste(sigma[epsilon]^2)))
# 
# ########################################################################
# #Example 2: Ten fold, Cross validation, environment 1,
# ########################################################################
# 
# rm(list=ls())
# setwd(tempdir())
# library(BGLR)
# data(wheat)     #Loads the wheat dataset
# nIter<-1500     #For real data sets more samples are needed
# burnIn<-500     
# thin<-10
# folds<-10
# y<-wheat.Y[,1]
# A<-wheat.A
# 
# priorBL<-list(
#                varE=list(df=3,S=2.5),
#                varU=list(df=3,S=0.63),
#                lambda = list(shape=0.52,rate=1e-5,value=20,type='random')
#              )
#              
# set.seed(123)  #Set seed for the random number generator
# sets<-rep(1:10,60)[-1]
# sets<-sets[order(runif(nrow(A)))]
# COR.CV<-rep(NA,times=(folds+1))
# names(COR.CV)<-c(paste('fold=',1:folds,sep=''),'Pooled')
# w<-rep(1/nrow(A),folds) ## weights for pooled correlations and MSE
# yHatCV<-numeric()
# 
# for(fold in 1:folds)
# {
#    yNa<-y
#    whichNa<-which(sets==fold)
#    yNa[whichNa]<-NA
#    prefix<-paste('PM_BL','_fold_',fold,'_',sep='')
#    fm<-BLR(y=yNa,XL=wheat.X,GF=list(ID=(1:nrow(wheat.A)),A=wheat.A),prior=priorBL,
#                nIter=nIter,burnIn=burnIn,thin=thin)
#    yHatCV[whichNa]<-fm$yHat[fm$whichNa]
#    w[fold]<-w[fold]*length(fm$whichNa)
#    COR.CV[fold]<-cor(fm$yHat[fm$whichNa],y[whichNa])
# }
# 
# COR.CV[11]<-mean(COR.CV[1:10])
# COR.CV
# 
# ########################################################################
# ## End(Not run)

Run the code above in your browser using DataLab