flexreg_binom: Flexible Regression Models for Binomial data

Description

The function fits some flexible regression models for binomial data via a Bayesian approach to inference based on Hamiltonian Monte Carlo algorithm. Available regression models are the flexible beta-binomial (type="FBB"), the beta-binomial ("type=BetaBin"), and the binomial one ("type=Bin").

Usage

flexreg_binom(
  formula,
  data,
  type = "FBB",
  n = NULL,
  link.mu = "logit",
  prior.beta = "normal",
  hyperparam.beta = 100,
  hyper.theta.a = NULL,
  hyper.theta.b = NULL,
  link.theta = NULL,
  prior.psi = NULL,
  hyperparam.psi = NULL,
  n.iter = 5000,
  burnin.perc = 0.5,
  n.chain = 1,
  thin = 1,
  verbose = TRUE,
  ...
)

Value

The flexreg_binom function returns an object of class `flexreg`, i.e. a list with the following elements:

call: the function call.
formula: the original formula.
link.mu: a character specifing the link function in the mean model.
link.theta: a character specifing the link function in the overdispersion model.
model: an object of class `stanfit` containing the fitted model.
response: the response variable, assuming values in (0, 1).
design.X: the design matrix for the mean model.
design.Z: the design matrix for the overdispersion model (if defined).

Arguments

formula: an object of class `formula`: a symbolic description of the model to be fitted (of type y ~ x or y ~ x | z).
data: an optional data frame, list, or object that is coercible to a data frame through base::as.data.frame containing the variables in the model. If not found in data, the variables in formula are taken from the environment from which the function flexreg is called.
type: a character specifying the type of regression model. Current options are the flexible beta-binomial "FBB" (default), the beta-binomial "BetaBin", and the binomial one "Bin".
n: the total number of trials.
link.mu: a character specifying the link function for the mean model (mu). Currently, "logit" (default), "probit", "cloglog", and "loglog" are supported.
prior.beta: a character specifying the prior distribution for the beta regression coefficients of the mean model. Currently, "normal" (default) and "cauchy" are supported.
hyperparam.beta: a positive numeric (vector of length 1) specifying the hyperprior standard deviation parameter for the prior distribution of beta regression coefficients. A value of 100 is suggested if the prior is "normal", 2.5 if "cauchy".
hyper.theta.a: a numeric (vector of length 1) specifying the first shape parameter for the beta prior distribution of theta.
hyper.theta.b: a numeric (vector of length 1) specifying the second shape parameter for the beta prior distribution of theta.
link.theta: a character specifying the link function for the overdispersion model (theta). Currently, "identity" (default), "logit", "probit", "cloglog", and "loglog" are supported. If link.theta = "identity", the prior distribution for theta is a beta.
prior.psi: a character specifying the prior distribution for psi regression coefficients of the overdispersion model (not supported if link.theta="identity"). Currently, "normal" (default) and "cauchy" are supported.
hyperparam.psi: a positive numeric (vector of length 1) specifying the hyperprior standard deviation parameter for the prior distribution of psi regression coefficients. A value of 100 is suggested if the prior is "normal", 2.5 if "cauchy".
n.iter: a positive integer specifying the number of iterations for each chain (including warmup). The default is 5000.
burnin.perc: the percentage of iterations per chain to discard.
n.chain: a positive integer specifying the number of Markov chains. The default is 1.
thin: a positive integer specifying the period for saving samples. The default is 1.
verbose: TRUE (default) or FALSE: flag indicating whether to print intermediate output.
...: additional arguments for rstan::sampling.

Details

Let Y be a random variable whose distribution can be specified in the type argument and $\mu$ be the mean of Y/n. The flexreg_binom function links the parameter $\mu$ to a linear predictor through a function $g(\cdot)$ specified in link.mu: $$g(\mu_i) = x_i^t \bold{\beta},$$ where $\bold{\beta}$ is the vector of regression coefficients for the mean model. By default, link.theta="identity", meaning that the overdispersion parameter $\theta$ is assumed to be constant. It is possible to extend the model by linking $\theta$ to an additional (possibly overlapping) set of covariates through a proper link function $q(\cdot)$ specified in the link.theta argument: $$q(\theta_i) = z_i^t \bold{\psi},$$ where $\bold{\psi}$ is the vector of regression coefficients for the overdispersion model. In flexreg_binom, the regression model for the mean and, where appropriate, for the overdispersion parameter can be specified in the formula argument with a formula of type $y \sim x_1 + x_2 | z_1 + z_2$ where covariates on the left of ("|") are included in the regression model for the mean and covariates on the right of ("|") are included in the regression model for the overdispersion.

If the second part is omitted, i.e., $y \sim x_1 + x_2$, the overdispersion is assumed constant for each observation.

References

Ascari, R., and Migliorati, S. (2021). A new regression model for overdispersed binomial data accounting for outliers and an excess of zeros. Statistics in Medicine, 40(17), 3895--3914. doi:10.1002/sim.9005

Examples

Run this code

if (FALSE) {
data(Bacteria)
fbb <- flexreg_binom(y~females, n=n, data=Bacteria, type="FBB")
}

Run the code above in your browser using DataLab