bmonomvn: Bayesian Estimation for Multivariate Normal Data with Monotone Missingness

Description

Bayesian estimation via sampling from the posterior distribution of the of the mean and covariance matrix of multivariate normal (MVN) distributed data with a monotone missingness pattern via Gibbs Sampling. Through the use of parsimonious/shrinkage regressions (currently only lasso is supported), where standard regressions fail, this function can handle an (almost) arbitrary amount of missing data.

Usage

bmonomvn(y, pre = TRUE, p = 0.9, B = 100, T = 100, thin = 10,
         r = 1, delta = 1, rao.s2 = TRUE, verb = 1, trace = FALSE)

Arguments

data matrix were each row is interpreted as a random sample from a MVN distribution with missing values indicated by NA

pre

logical indicating whether pre-processing of the y is to be performed. This sorts the columns so that the number of NAs is non-decreasing with the column index

when performing regressions, p is the proportion of the number of columns to rows in the design matrix before an alternative regression (lasso) is performed as if least-squares regression failed. Least-squares regre

number of Burn-In MCMC sampling rounds, during which samples are discarded

total number of MCMC sampling rounds to take place after burn-in, during which samples are saved

thin

level of thinning in the MCMC, i.e., the number of MCMC rounds must be collected before a sample is saved

alpha (shape) parameter to the gamma distribution prior for the lasso parameter lambda

delta

beta (rate) parameter to the gamma distribution prior for the lasso parameter lambda

rao.s2

indicates whether to use Rao-Blackwellized samples for $\sigma^2$ should be used (default TRUE), see the details section of blasso for more information

verb

verbosity level; currently only verb = 0 and verb = 1 are supported

trace

if TRUE then samples from the regressions are saved to files in the CWD

Value

bmonomvn returns an object of class "monomvn", which is a list containing a subset of the components below.
calla copy of the function call as used
muestimated mean vector with columns corresponding to the columns of y
Sestimated covariance matrix with rows and columns corresponding to the columns of y
mu.varestimated variance of mean vector with columns corresponding to the columns of y
nawhen pre = TRUE this is a vector containing number of NA entries in each column of y
owhen pre = TRUE this is a vector containing the index of each column in the sorting of the columns of y obtained by o <- order(na)
methodmethod of regression used on each column, or "bcomplete" indicating that no regression was used
ncompRecords the mean $\lambda^2$ value found in the trace of the Bayesian Lasso regressions. This value will be zero if the corresponding column corresponds to a complete case or a ordinary least squares regression. This field is named $ncomp (rather than $lambda2) for compatibility with the monomvn output
traceif input trace = TRUE then this field contains traces of the samples of mu in the field $mu and of S in the field $S, and of all regression parameters for each of the m = length(mu) columns in the field $reg
Bfrom inputs: number of Burn-In MCMC sampling rounds, during which samples are discarded
Tfrom inputs: total number of MCMC sampling rounds to take place after burn-in, during which samples are saved
rfrom inputs: alpha (shape) parameter to the gamma distribution prior for the lasso parameter lambda
deltafrom inputs: beta (rate) parameter to the gamma distribution prior for the lasso parameter lambda

Details

If pre = TRUE then bmonomvn first re-arranges the columns of y into nondecreasing order with respect to the number of missing (NA) entries. Then (at least) the first column should be completely observed.

Samples from the posterior distribution of the over a Multivariate Normal mean vector and covariance matrix are obtained by obtaining samples from the posterior distribution of Bayesian regression models. The method by which these samples from the regression posterior(s) are used to obtain samples from the mean vector and covariance matrix is outlined in the monomvn documentation, detailing a similarly structured maximum likelihood approach. See also the references below.

Whenever the regression model is ill--posed (i.e., when there are more covariates than responses, or a big p small n problem) then Bayesian Lasso regressions are used instead. See the Park & Casella reference below. The p argument can be used to turn on Lasso regressions at other times.

One difference between the Bayesian and MLE approach is that the MLE approach treats the complete (fully observed) columns jointly, without performing regressions. In contrast, the Bayesian only treats the first complete column without regressions. The remaining complete columns are processed via regression.

References

Robert B. Gramacy and Joo Hee Lee (2007). On estimating covariances between many assets with histories of highly variable length. Preprint available on arXiv:0710.5837: http://arxiv.org/abs/0710.5837

Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely.

Park, T., Casella, G. (2008). The Bayesian Lasso, (unpublished) http://www.stat.ufl.edu/~casella/Papers/bayeslasso.pdf

http://www.statslab.cam.ac.uk/~bobby/monomvn.html

Examples

Run this code

## standard usage, duplicating the results in
## Little and Rubin, section 7.4.3 -- try adding 
## verb=3 argument for a step-by-step breakdown
data(cement.miss)
out <- bmonomvn(cement.miss)
out
out$mu
out$S

##
## A bigger example, comparing the various methods
##

## generate N=1000 samples from a random MVN
xmuS <- randmvn(1000, 100)

## randomly impose monotone missingness
xmiss <- rmono(xmuS$x, m=110)

## using least squares only when necessary
out.b <- bmonomvn(xmiss)
out.b
kl.norm(out.b$mu, out.b$S, xmuS$mu, xmuS$S)
out.mle <- monomvn(xmiss)
kl.norm(out.mle$mu, out.mle$S, xmuS$mu, xmuS$S)

## using least squares sparingly
out.b.s <- bmonomvn(xmiss, p=0.25)
kl.norm(out.b.s$mu, out.b.s$S, xmuS$mu, xmuS$S)
out.mle.s <- monomvn(xmiss, p=0.25)
kl.norm(out.mle.s$mu, out.mle.s$S, xmuS$mu, xmuS$S)

Run the code above in your browser using DataLab