Learn R Programming

HZIP (version 0.1.1)

rHZIP: Simulate Data from a Hierarchical Zero-Inflated Poisson (HZIP) Model

Description

rHZIP() generates panel/longitudinal data from a two–part hierarchical zero-inflated Poisson model with subject-specific random effects for both the zero-inflation and the count components. Random effects are drawn from a generalized log-gamma (GLG) distribution via rgengamma() (user must provide/attach a function with this name; see Dependencies).

Usage

rHZIP(n, m, para, x1, x2)

Value

A tibble with columns:

Ind

Subject identifier (1..n), repeated according to m.

y

Simulated response.

x*, w*

The covariates from x1 and x2 (renamed as described below).

The object has an attribute "propZeros": a 3×1 data.frame with rows "Zeros" (structural zeros, %), "Count" (extra zeros from the Poisson part, %), and "Total" (overall zero percentage).

Arguments

n

Integer. Number of subjects.

m

Integer vector of length n (or a scalar recycled to length n). Numbers of repeated measurements per subject.

para

Numeric vector of parameters in the order c(lambda1, beta1, lambda2, beta2), where:

  • lambda1: GLG scale/shape parameter for the zero part.

  • beta1: length-p1 vector of coefficients for the zero part (matches ncol(x1)).

  • lambda2: GLG scale/shape parameter for the count part.

  • beta2: length-p2 vector of coefficients for the count part (matches ncol(x2)).

Internally, the linear predictors are \(\eta^{(0)}_{ij} = x^{\top}_{1,ij}\beta_1 + b^{(0)}_i\) and \(\eta^{(1)}_{ij} = x^{\top}_{2,ij}\beta_2 + b^{(1)}_i\), with \(p_{ij} = 1 - \exp\{-\exp(\eta^{(0)}_{ij})\}\) (cloglog link for zero-inflation) and \(\mu_{ij} = \exp(\eta^{(1)}_{ij})\) (log link for counts).

x1

Numeric matrix of covariates for the zero-inflation part (dimension sum(m) by p1). Include an intercept column if desired.

x2

Numeric matrix of covariates for the count part (dimension sum(m) by p2). Include an intercept column if desired.

Column naming

Columns of x1 are renamed to x1, x2, ..., xp1. Columns of x2 are copied except the first column is dropped and the remaining are renamed w1, w2, ..., w_{p2-1}. (This mirrors the current implementation that excludes the first x2 column from the output.)

Dependencies

This function calls rgengamma() to draw GLG random effects. Ensure that such a function is available on the search path (e.g., from a package that provides a generalized log-gamma RNG) or provide your own implementation with the signature rgengamma(n, mu, sigma, lambda). It also uses dplyr and tibble.

Details

For subject \(i=1,\dots,n\) with \(m_i\) observations, the model draws two subject-level random effects \(b^{(0)}_i\) and \(b^{(1)}_i\) independently from GLG distributions parameterized by lambda1 and lambda2. Conditional on these effects, outcomes are generated as $$Y_{ij} = Z_{ij}\times C_{ij},$$ where \(Z_{ij}\sim \mathrm{Bernoulli}(p_{ij})\) controls structural zeros and \(C_{ij}\sim \mathrm{Poisson}(\mu_{ij})\) controls the count size.

The returned data frame contains subject IDs, the response y, and the supplied covariates. An attribute "propZeros" stores a small summary with the percentage of structural zeros, additional Poisson zeros, and total zeros.

See Also

hzip for model fitting.

Examples

Run this code
set.seed(123)

n <- 50
m <- rep(4, n)
N <- sum(m)

# design matrices (with intercepts)
x1 <- cbind(1, rnorm(N))
x2 <- cbind(1, rnorm(N), rbinom(N, 1, 0.5))

p1 <- ncol(x1); p2 <- ncol(x2)
lambda1 <- 0.7
beta1   <- c(-0.2, 0.6)
lambda2 <- 0.9
beta2   <- c( 0.3, 0.5, -0.4)

para <- c(lambda1, beta1, lambda2, beta2)

sim <- rHZIP(n, m, para, x1, x2)
head(sim)
attr(sim, "propZeros")

Run the code above in your browser using DataLab