Learn R Programming

countSTAR: Flexible Modeling for Count Data

Overview

Count-valued data are common in many fields. Frequently, count data are observed jointly with predictors, over time intervals, or across spatial locations. Furthermore, they often exhibit a variety of complex distributional features, including zero-inflation, skewness, over- and underdispersion, and in some cases may be bounded or censored. Flexible and interpretable models for count-valued processes are therefore highly useful in practice.

countSTAR implements a variety of methods for modeling such processes, based on the idea of Simultaneous Transformation and Rounding (STAR). Estimation, inference, and prediction for STAR are available for both Bayesian and frequentist models. The bulk of methods serve for static regression problems, but the package also supports time series analysis via the warped Dynamic Linear Model (DLM) framework.

Broadly, STAR defines an count-valued probability model by (1) specifying a (conditionally) Gaussian model for continuous latent data and (2) connecting the latent data to the observed data via a transformation and rounding operation.

Importantly, STAR models are highly flexible count-valued processes, and provide the capability to model (i) discrete data, (ii) zero-inflation, (iii) over- or under-dispersion, (iv) heaping, and (v) bounded or censored data. The modularity of the STAR framework allows for the ability to utilize a wide variety of different latent data models, which can range from simple forms like linear regression to more advanced machine learning methods such as random forests or gradient boosting machines.

countSTAR can be installed and loaded as follows:

#CRAN version
install.packages("countSTAR")

#Development version
remotes::install_github("bking124/countSTAR")

library("countSTAR")

Detailed information on the different options for STAR models and how they are implemented in countSTAR can be found in the vignette, accessible on the website or by running the command vignette("countSTAR"). A basic breakdown of the available modeling functions is shown below:

Analysis TypeMethod (function)Dependent Package
Static Classical RegressionLinear regression (lm_star())-
-Generalized boosted modeling (gbm_star())gbm
-Random Forests (randomForest_star())randomForest
Static Bayesian RegressionLinear regression (blm_star())-
-Additive modeling (bam_star())spikeSlabGAM
-Spline regression (spline_star())spikeSlabGAM
-Bayesian additive regression trees (bart_star())dbarts
Time Series ModelingWarped Dynamic Linear Models (warpDLM())KFAS

In addition to these ready to use functions, users can also implement STAR methods with custom latent regression models using the genEM_star() and genMCMC_star() functions.

Please submit any issues or feature requests to https://github.com/bking124/countSTAR/issues.

Copy Link

Version

Install

install.packages('countSTAR')

Monthly Downloads

233

Version

1.0.2

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Brian King

Last Published

June 30th, 2023

Functions in countSTAR (1.0.2)

blm_star

STAR Bayesian Linear Regression
BrentMethod

Brent's method for optimization
confint.lmstar

Compute asymptotic confidence intervals for STAR linear regression
g_bc

Box-Cox transformation
g_cdf

Cumulative distribution function (CDF)-based transformation
g_bnp

Bayesian bootstrap-based transformation
expectation_sqrt

Estimate the mean for a STAR process
blm_star_bnpgibbs

Gibbs sampler for STAR linear regression with BNP transformation
bart_star_ispline

MCMC sampler for BART-STAR with a monotone spline model for the transformation
credBands

Compute Simultaneous Credible Bands
expectation_identity

Estimate the mean for a STAR process
expectation_log

Estimate the mean for a STAR process
ergMean

Compute the ergodic (running) mean.
g_inv_approx

Approximate inverse transformation
init_bam_orthog

Initialize the parameters for an additive model
getEffSize

Summarize of effective sample size
g_inv_bc

Inverse Box-Cox transformation
genEM_star

Generalized EM estimation for STAR
gbm_star

Fitting STAR Gradient Boosting Machines via EM algorithm
logLikePointRcpp

Compute the pointwise log-likelihood for STAR
logit

Compute the log-odds
init_lm_hs

Initialize linear regression parameters assuming a horseshoe prior
expectation_gRcpp

Estimate the mean for a STAR process
expectation2_gRcpp

Compute E(Y^2) for a STAR process
randomForest_star

Fit Random Forest STAR with EM algorithm
plot_fitted

Plot the fitted values and the data
genMCMC_star

Generalized MCMC Algorithm for STAR
plot_coef

Plot the estimated regression coefficients and credible intervals
genMCMC_star_ispline

MCMC sampler for STAR with a monotone spline model for the transformation
plot_pmf

Plot the empirical and model-based probability mass functions
init_lm_gprior

Initialize linear regression parameters assuming a g-prior
interval_gRcpp

Estimate confidence intervals/bands for a STAR process
init_bam_thin

Initialize the parameters for an additive model
roaches

Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.
logLikeRcpp

Compute the log-likelihood for STAR
init_lm_ridge

Initialize linear regression parameters assuming a ridge prior
init_params_mean

Initialize the parameters for a simple mean-only model
round_floor

Rounding function
sample_lm_gprior

Sample the linear regression parameters assuming a g-prior
sample_lm_ridge

Sample linear regression parameters assuming a ridge prior
sample_bam_orthog

Sample the parameters for an additive model
sample_lm_hs

Sample linear regression parameters assuming horseshoe prior
sample_bam_thin

Sample the parameters for an additive model
warpDLM

Posterior Inference for warpDLM model with latent structural DLM
sampleFastGaussian

Sample a Gaussian vector using the fast sampler of BHATTACHARYA et al.
rtruncnormRcpp

Sample from a truncated normal distribution
splineBasis

Initialize and reparametrize a spline basis matrix
spline_star

Estimation for Bayesian STAR spline regression
pmaxRcpp

pmax() in Rcpp
predict.lmstar

Predict method for response in STAR linear model
simBaS

Compute Simultaneous Band Scores (SimBaS)
lm_star

Fitting frequentist STAR linear model via EM algorithm
truncnorm_mom

Compute the first and second moment of a truncated normal
sample_params_mean

Sample the parameters for a simple mean-only model
spline_star_exact

Monte Carlo predictive sampler for spline regression
pminRcpp

pmin() in Rcpp
invlogit

Compute the inverse log-odds
simulate_nb_friedman

Simulate count data from Friedman's nonlinear regression
pvals

Compute coefficient p-values for STAR linear regression using likelihood ratio test
simulate_nb_lm

Simulate count data from a linear regression
update_struct

Update parameters for warpDLM model with trend DLM
uni.slice

Univariate Slice Sampler from Neal (2008)
bam_star

Fit Bayesian Additive STAR Model with MCMC
computeTimeRemaining

Estimate the remaining time in the MCMC based on previous samples
bart_star

MCMC Algorithm for BART-STAR
a_j

Inverse rounding function
blm_star_exact

Monte Carlo sampler for STAR linear regression with a g-prior