This sets up a sampler object, based on the specification of a model. The object contains functions to
draw a set of model parameters from their prior and conditional posterior distributions, and
to generate starting values for the MCMC simulation. The functions share a common environment
containing precomputed quantities such as design matrices based on the model and the data.
The sampler object is the main input for the MCMC simulation function MCMCsim
.
create_sampler(
formula,
data = NULL,
family = "gaussian",
ny = NULL,
ry,
r.mod,
sigma.fixed = NULL,
sigma.mod = NULL,
Q0 = NULL,
formula.V = NULL,
logJacobian = NULL,
linpred = NULL,
compute.weights = FALSE,
block = compute.weights,
prior.only = FALSE,
control = NULL
)
formula to specify the response variable and additive model components. The model components form
the linear predictor part of the model. A model component on the right hand side can currently be
either a regression term specified with reg(...)
or a generic random effect term specified
with gen(...)
. See for details the help pages for these model component creation functions.
Other terms in the formula are interpreted as ordinary regression
effects (treated in the same way as reg(...)
terms, but without the option to change e.g. the prior),
An offset can be specified as offset(...)
.
data frame with n rows in which the variables specified in model components (if any) can be found.
character string describing the data distribution. The default is 'gaussian'. Other options are 'binomial' for the binomial distribution and 'negbinomial' for the negative binomial distribution. For the binomial and negative binomial distributions logistic and log link functions are used by default, respectively. In the case of binary data a probit link function is also supported for the binomial/Bernoulli distribution.
in case family="binomial"
the (vector of) numbers of trials.
It can be either a numeric vector or the name of a variable in data
.
Defaults to a vector of 1s.
in case family="negbinomial"
the dispersion parameter or vector
of dispersion parameters. Use r.mod
instead if the (scalar)
dispersion parameter should be inferred from the data. If neither
ry
nor r.mod
is specified a default chi-squared
prior with 1 degree of freedom is assumed.
prior specification for a scalar (reciprocal) dispersion parameter
of the negative binomial distribution, in case the dispersion parameter is
to be inferred. The prior can be specified by a call to a prior specification function.
Currently pr_invchisq
, pr_gig
and pr_fixed
are supported.
for Gaussian models, if TRUE
the residual standard deviation parameter 'sigma_' is fixed at 1. In that case
argument sigma.mod
is ignored. This is convenient for Fay-Herriot type models with (sampling) variances assumed to be known.
Default is FALSE
.
prior for the variance parameter of a gaussian sampling distribution.
This can be specified by a call to one of the prior specification functions
pr_invchisq
, pr_exp
, pr_gig
or pr_fixed
for
inverse chi-squared, exponential, generalized inverse gaussian or degenerate prior distribution,
respectively. The default is an improper prior pr_invchisq(df=0, scale=1)
. A half-t prior on the
standard deviation can be specified using pr_invchisq
with a chi-squared distributed scale
parameter.
n x n data-level precision matrix for a Gaussian model. It defaults to the unit matrix.
If an n-vector is provided it will be expanded to a (sparse) diagonal matrix with Q0 on its diagonal.
If a name is supplied it will be looked up in data
and subsequently expanded to a diagonal matrix.
a formula specifying the terms of a variance model in the case of a Gaussian likelihood.
Currently two types of terms are supported: a regression term for the log-variance
specified with vreg(...)
, and a term vfac(...)
for multiplicative modeled factors
at a certain level specified by a factor variable. By using unit-level inverse-chi-squared factors the marginal
sampling distribution becomes a Student-t distribution, and by using unit-level exponential factors it becomes
a Laplace or double exponential distribution.
if the data are transformed the logarithm of the Jacobian can be supplied so that it
is incorporated in all log-likelihood computations. This can be useful for comparing information criteria
for different transformations. It should be supplied as a vector of the same size as the response variable,
and is currently only supported if family="gaussian"
.
For example, when a log-transformation is used on response vector y
, the vector -log(y)
should be supplied.
a list of matrices defining (possibly out-of-sample) linear predictors to be simulated.
This allows inference on e.g. (sub)population totals or means. The list must be of the form
list(name_1=X_1, ...)
where the names refer to the model component names and predictions are
computed by summing X_i %*% p[[name_i]]
. Alternatively, linpred="fitted"
can be used
for simulations of the full in-sample linear predictor.
if TRUE
weights are computed for each element of linpred
. Note that for
a large dataset in combination with vector-valued linear predictors the weights can take up a lot of memory.
By default only means are stored in the simulation carried out using MCMCsim
.
if TRUE
all coefficients are sampled in a single block. Alternatively, a list of
character vectors indicating which coefficients should be sampled together in blocks.
whether a sampler is set up only for sampling from the prior or for sampling from both prior
and posterior distributions. Default FALSE
. If TRUE
there is no need to specify a response in
formula
. This is used by generate_data
, which samples from the prior predictive
distribution.
a list with further computational options, see details section.
A sampler object, which is the main input for the MCMC simulation
function MCMCsim
. The sampler object is an environment with
precomputed quantities and functions. The main functions are rprior
,
which returns a sample from the prior distributions, draw
,
which returns a sample from the full conditional posterior distributions,
and start
, which returns a list with starting values for the Gibbs
sampler. If prior.only
is TRUE
, functions draw
and
start
are not created.
The right hand side of the formula
argument to create_sampler
can be used to specify
additive model components. Currently two specialized model components are supported, reg(...)
and gen(...)
for regression and generic random effects components, respectively.
For gaussian models, formula.V
can be used to specify the variance structure of the model.
Currently two specialized variance model components are supported, vreg(...)
for regression
effects predicting the log-variance and vfac(...)
for modeled variance factors.
Further computational options can be set using the control
parameter, which should be
passed as a list with possible elements
whether to add the outer product of the constraint matrix for a better conditioned solve system for blocks. This is done by default when using blocked Gibbs sampling for blocks with constraints.
when FALSE
residuals or linear predictors are only computed at the start of the simulation.
This may give a modest speedup but in some cases may be less accurate due to roundoff error accumulation.
Default is TRUE
.
J.H. Albert and S. Chib (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association 88(422), 669-679.
D. Bates, M. Maechler, B. Bolker and S.C. Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67(1), 1-48.
N. Polson, J.G. Scott and J. Windle (2013). Bayesian Inference for Logistic Models Using Polya-Gamma Latent Variables. Journal of the American Statistical Association 108(504), 1339-1349.
H. Rue and L. Held (2005). Gaussian Markov Random Fields. Chapman & Hall/CRC.
M. Zhou and L. Carin (2015). Negative Binomial Process Count and Mixture Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(2), 307-320.
# NOT RUN {
# first generate some data
n <- 200
x <- rnorm(n)
y <- 0.5 + 2*x + 0.3*rnorm(n)
# create a sampler for a simple linear regression model
sampler <- create_sampler(y ~ x)
sim <- MCMCsim(sampler)
(summary(sim))
y <- rbinom(n, 1, 1 / (1 + exp(-(0.5 + 2*x))))
# create a sampler for a binary logistic regression model
sampler <- create_sampler(y ~ x, family="binomial")
sim <- MCMCsim(sampler)
(summary(sim))
# }
Run the code above in your browser using DataLab