Learn R Programming

bage (version 0.9.0)

replicate_data: Create Replicate Data

Description

Use a fitted model to create replicate datasets, typically as a way of checking a model.

Usage

replicate_data(x, condition_on = NULL, n = 19)

Value

A tibble

with the following structure:

.replicatedata
"Original"Original data supplied to mod_pois(), mod_binom(), mod_norm()
"Replicate 1"Simulated data.
"Replicate 2"Simulated data.
......
"Replicate <n>"Simulated data.

Arguments

x

A fitted model, typically created by calling mod_pois(), mod_binom(), or mod_norm(), and then fit().

condition_on

Parameters to condition on. Either "expected" or "fitted". See details.

n

Number of replicate datasets to create. Default is 19.

The <code>condition_on</code> argument

With Poisson and binomial models that include dispersion terms (which is the default), there are two options for constructing replicate data.

  • When condition_on is "fitted", the replicate data is created by (i) drawing values from the posterior distribution for rates or probabilities (the \(\gamma_i\) defined in mod_pois() and mod_binom()), and (ii) conditional on these rates or probabilities, drawing values for the outcome variable.

  • When condition_on is "expected", the replicate data is created by (i) drawing values from hyper-parameters governing the rates or probabilities (the \(\mu_i\) and \(\xi\) defined in mod_pois() and mod_binom()), then (ii) conditional on these hyper-parameters, drawing values for the rates or probabilities, and finally (iii) conditional on these rates or probabilities, drawing values for the outcome variable.

The default for condition_on is "expected". The "expected" option provides a more severe test for a model than the "fitted" option, since "fitted" values are weighted averages of the "expected" values and the original data.

As described in mod_norm(), normal models have a different structure from Poisson and binomial models, and the distinction between "fitted" and "expected" does not apply.

Data models for outcomes

If a data model has been provided for the outcome variable, then creation of replicate data will include a step where errors are added to outcomes. For instance, the a rr3 data model is used, then replicate_data() rounds the outcomes to base 3.

Details

Use n draws from the posterior distribution for model parameters to generate n simulated datasets. If the model is working well, these simulated datasets should look similar to the actual dataset.

See Also

  • mod_pois(), mod_binom(), mod_norm() Create model.

  • fit() Fit model.

  • report_sim() Simulation study of model.

Examples

Run this code
mod <- mod_pois(injuries ~ age:sex + ethnicity + year,
                data = nzl_injuries,
                exposure = 1) |>
  fit()

rep_data <- mod |>
  replicate_data()

library(dplyr)
rep_data |>
  group_by(.replicate) |>
  count(wt = injuries)

## when the overall model includes an rr3 data model,
## replicate data are rounded to base 3
mod_pois(injuries ~ age:sex + ethnicity + year,
         data = nzl_injuries,
         exposure = popn) |>
  set_datamod_outcome_rr3() |>
  fit() |>
  replicate_data()

Run the code above in your browser using DataLab