simulate_nb_lm

Simulate data from a negative-binomial distribution with linear mean function.

For Bayesian and classical inference and prediction with count-valued data,
Simultaneous Transformation and Rounding (STAR) Models provide a flexible, interpretable,
and easy-to-use approach. STAR models the observed count data using a rounded
continuous data model and incorporates a transformation for greater flexibility.
Implicitly, STAR formalizes the commonly-applied yet incoherent procedure of
(i) transforming count-valued data and subsequently
(ii) modeling the transformed data using Gaussian models.
STAR is well-defined for count-valued data, which is reflected in predictive accuracy,
and is designed to account for zero-inflation, bounded or censored data, and over- or underdispersion.
Importantly, STAR is easy to combine with existing MCMC or point estimation
methods for continuous data, which allows seamless adaptation of continuous data
models (such as linear regressions, additive models, BART, random forests,
and gradient boosting machines) for count-valued data. The package also includes several
methods for modeling count time series data, namely via warped Dynamic Linear Models.
For more details and background on these methodologies, see the works of
Kowal and Canale (2020) <doi:10.1214/20-EJS1707>,
Kowal and Wu (2022) <doi:10.1111/biom.13617>,
King and Kowal (2022) <arXiv:2110.14790>, and
Kowal and Wu (2023) <arXiv:2110.12316>.

Brian King

countSTAR

Flexible Modeling of Count Data

Dan Kowal

simulate_nb_lm function

<dl><dt>n</dt>
<dd>number of observations</dd>
<dt>p</dt>
<dd>number of predictors (including the intercept)</dd>
<dt>r_nb</dt>
<dd>the dispersion parameter of the Negative Binomial dispersion;
smaller values imply greater overdispersion, while larger values approximate the Poisson distribution.</dd>
<dt>b_int</dt>
<dd>intercept; default is log(1.5), which implies the expected count is 1.5
when all predictors are zero</dd>
<dt>b_sig</dt>
<dd>regression coefficients for true signals; default is log(2.0), which implies a
twofold increase in the expected counts for a one unit increase in x</dd>
<dt>sigma_true</dt>
<dd>standard deviation of the Gaussian innovation; default is zero.</dd>
<dt>ar1</dt>
<dd>the autoregressive coefficient among the columns of the X matrix; default is zero.</dd>
<dt>intercept</dt>
<dd>a Boolean indicating whether an intercept column should be included
in the returned design matrix; default is FALSE</dd>
<dt>seed</dt>
<dd>optional integer to set the seed for reproducible simulation; default is NULL
which results in a different dataset after each run</dd></dl>

Arguments

Simulate count data from a linear regression — simulate_nb_lm

<dl>

<dt>n</dt>
<dd>number of observations</dd>


<dt>p</dt>
<dd>number of predictors (including the intercept)</dd>


<dt>r_nb</dt>
<dd>the dispersion parameter of the Negative Binomial dispersion;
smaller values imply greater overdispersion, while larger values approximate the Poisson distribution.</dd>


<dt>b_int</dt>
<dd>intercept; default is log(1.5), which implies the expected count is 1.5
when all predictors are zero</dd>


<dt>b_sig</dt>
<dd>regression coefficients for true signals; default is log(2.0), which implies a
twofold increase in the expected counts for a one unit increase in x</dd>


<dt>sigma_true</dt>
<dd>standard deviation of the Gaussian innovation; default is zero.</dd>


<dt>ar1</dt>
<dd>the autoregressive coefficient among the columns of the X matrix; default is zero.</dd>


<dt>intercept</dt>
<dd>a Boolean indicating whether an intercept column should be included
in the returned design matrix; default is FALSE</dd>


<dt>seed</dt>
<dd>optional integer to set the seed for reproducible simulation; default is NULL
which results in a different dataset after each run</dd>

</dl>

Data engineering and BI courses are free!

simulate_nb_lm: Simulate count data from a linear regression

Description

Usage

Value

Arguments

Details

Examples