replicate_data: Create Replicate Data

Description

Use a fitted model to create replicate datasets, typically as a way of checking a model.

Usage

replicate_data(x, condition_on = NULL, n = 19)

Value

A tibble

with the following structure:

`.replicate`	data
`"Original"`	Original data supplied to `mod_pois()`, `mod_binom()`, `mod_norm()`
`"Replicate 1"`	Simulated data.
`"Replicate 2"`	Simulated data.
...	...
`"Replicate <n>"`	Simulated data.

Arguments

x: A fitted model, typically created by calling mod_pois(), mod_binom(), or mod_norm(), and then fit().
condition_on: Parameters to condition on. Either "expected" or "fitted". See details.
n: Number of replicate datasets to create. Default is 19.

The <code>condition_on</code> argument

With Poisson and binomial models that include dispersion terms (which is the default), there are two options for constructing replicate data.

When condition_on is "fitted", the replicate data is created by (i) drawing values from the posterior distribution for rates or probabilities (the \(\gamma_i\) defined in mod_pois() and mod_binom()), and (ii) conditional on these rates or probabilities, drawing values for the outcome variable.
When condition_on is "expected", the replicate data is created by (i) drawing values from hyper-parameters governing the rates or probabilities (the \(\mu_i\) and \(\xi\) defined in mod_pois() and mod_binom()), then (ii) conditional on these hyper-parameters, drawing values for the rates or probabilities, and finally (iii) conditional on these rates or probabilities, drawing values for the outcome variable.

The default for condition_on is "expected". The "expected" option provides a more severe test for a model than the "fitted" option, since "fitted" values are weighted averages of the "expected" values and the original data.

As described in mod_norm(), normal models have a different structure from Poisson and binomial models, and the distinction between "fitted" and "expected" does not apply.

Data models for outcomes

If a data model has been provided for the outcome variable, then creation of replicate data will include a step where errors are added to outcomes. For instance, the a rr3 data model is used, then replicate_data() rounds the outcomes to base 3.

Details

Use n draws from the posterior distribution for model parameters to generate n simulated datasets. If the model is working well, these simulated datasets should look similar to the actual dataset.

Examples

Run this code

mod <- mod_pois(injuries ~ age:sex + ethnicity + year,
                data = nzl_injuries,
                exposure = 1) |>
  fit()

rep_data <- mod |>
  replicate_data()

library(dplyr)
rep_data |>
  group_by(.replicate) |>
  count(wt = injuries)

## when the overall model includes an rr3 data model,
## replicate data are rounded to base 3
mod_pois(injuries ~ age:sex + ethnicity + year,
         data = nzl_injuries,
         exposure = popn) |>
  set_datamod_outcome_rr3() |>
  fit() |>
  replicate_data()

Run the code above in your browser using DataLab