predict.mcdraws: Generate draws from the predictive distribution

Description

Generate draws from the predictive distribution

Usage

# S3 method for mcdraws
predict(
  object,
  newdata = NULL,
  X. = if (is.null(newdata)) "in-sample" else NULL,
  type = c("data", "link", "response", "data_cat"),
  var = NULL,
  ny = NULL,
  ry = NULL,
  fun. = identity,
  labels = NULL,
  ppcheck = FALSE,
  iters = NULL,
  to.file = FALSE,
  filename,
  write.single.prec = FALSE,
  show.progress = TRUE,
  verbose = TRUE,
  n.cores = 1L,
  cl = NULL,
  seed = NULL,
  export = NULL,
  ...
)

Value

An object of class dc, containing draws from the posterior (or prior) predictive distribution. If ppcheck=TRUE posterior predictive p-values are returned as an additional attribute. In case to.file=TRUE the file name used is returned.

Arguments

object: an object of class mcdraws, as output by MCMCsim.
newdata: data frame with auxiliary information to be used for prediction.
X.: a list of design matrices; alternatively, X. equals 'in-sample' or 'linpred'. If 'in-sample' (the default if newdata is not supplied), the design matrices for in-sample prediction are used. If 'linpred' the 'linpred_' component of object is used.
type: the type of predictions. The default is "data", meaning that new data is generated according to the predictive distribution. If type="link" only the linear predictor for the mean is generated, and in case type="response" the linear predictor is transformed to the response scale. For Gaussian models type="link" and type="response" are equivalent. For binomial and negative binomial models type="response" returns the simulations of the latent probabilities. For multinomial models type="link" generates the linear predictor for all categories except the last, and type="response" transforms this vector to the probability scale, and type="data" generates the multinomial data, all in long vector format, where the output for all categories (except the last) are stacked. For multinomial models and single trials, a further option is type="data_cat", which generates the data as a categorical vector, with integer coded levels.
var: variance(s) used for out-of-sample prediction. By default 1.
ny: number of trials used for out-of-sample prediction in case of a binomial model. By default 1.
ry: fixed part of the (reciprocal) dispersion parameter in case of a negative binomial model.
fun.: function applied to the vector of posterior predictions to compute one or multiple summaries or test statistics. The function can have one or two arguments. The first argument is always the vector of posterior predictions. The optional second argument represents a list of model parameters, needed only when a test statistic depends on them. The function must return an integer or numeric vector.
labels: optional names for the output object. Must be a vector of the same length as the result of fun..
ppcheck: if TRUE, function fun. is also applied to the observed data and an MCMC approximation is computed of the posterior predictive probability that the test statistic for predicted data is greater than the test statistic for the observed data.
iters: iterations in object to use for prediction. Default NULL means that all draws from object are used.
to.file: if TRUE the predictions are streamed to file.
filename: name of the file to write predictions to in case to.file=TRUE.
write.single.prec: Whether to write to file in single precision. Default is FALSE.
show.progress: whether to show a progress bar.
verbose: whether to show informative messages.
n.cores: the number of cpu cores to use. Default is one, i.e. no parallel computation. If an existing cluster cl is provided, n.cores will be set to the number of workers in that cluster.
cl: an existing cluster can be passed for parallel computation. If NULL and n.cores > 1, a new cluster is created.
seed: a random seed (integer). For parallel computation it is used to independently seed RNG streams for all workers.
export: a character vector with names of objects to export to the workers. This may be needed for parallel execution if expressions in fun. depend on global variables.
...: currently not used.

Examples

Run this code

# \donttest{
n <- 250
dat <- data.frame(x=runif(n))
dat$y <- 1 + dat$x + rnorm(n)
sampler <- create_sampler(y ~ x, data=dat)
sim <- MCMCsim(sampler)
summary(sim)
# in-sample prediction
pred <- predict(sim, ppcheck=TRUE)
hist(attr(pred, "ppp"))
# out-of-sample prediction
pred <- predict(sim, newdata=data.frame(x=seq(0, 1, by=0.1)))
summary(pred)
# }

Run the code above in your browser using DataLab