calc_risk_diff: Calculate Risk Differences with Robust Model Fitting and Boundary Detection

Description

Calculates risk differences (or prevalence differences for cross-sectional data) using generalized linear models with identity, log, or logit links. Version 0.2.1 includes enhanced boundary detection, robust confidence intervals, and improved data quality validation to prevent extreme confidence intervals in stratified analyses.

The function addresses common convergence issues with identity link binomial GLMs by implementing a fallback strategy across multiple link functions, similar to approaches described in Donoghoe & Marschner (2018) for relative risk regression.

Usage

calc_risk_diff(
  data,
  outcome,
  exposure,
  nnt = FALSE,
  adjust_vars = NULL,
  strata = NULL,
  link = "auto",
  alpha = 0.05,
  boundary_method = "auto",
  verbose = FALSE
)

Value

A tibble of class "riskdiff_result" containing the following columns:

exposure_var: Character. Name of exposure variable analyzed
rd: Numeric. Risk difference estimate OR Number Needed to Treat if nnt=TRUE (see Details)
ci_lower: Numeric. Lower bound of confidence interval (RD scale or NNT scale)
ci_upper: Numeric. Upper bound of confidence interval (RD scale or NNT scale)
p_value: Numeric. P-value for test of null hypothesis (risk difference = 0)
model_type: Character. Link function successfully used ("identity", "log", "logit", or error type)
n_obs: Integer. Number of observations used in analysis
on_boundary: Logical. TRUE if MLE is on parameter space boundary
boundary_type: Character. Type of boundary: "none", "upper_bound", "lower_bound", "separation", "both_bounds"
boundary_warning: Character. Warning message for boundary cases (if any)
ci_method: Character. Method used for confidence intervals ("wald", "profile", "bootstrap")
...: Additional columns for stratification variables if specified

The returned object has attributes including the original function call and alpha level used. Risk differences are on the probability scale where 0.05 represents a 5 percentage point difference.

Arguments

data: A data frame containing all necessary variables
outcome: Character string naming the binary outcome variable (must be 0/1 or logical)
exposure: Character string naming the exposure variable of interest
nnt: Logical indicating whether to return Number Needed to Treat instead of risk difference (default: FALSE)
adjust_vars: Character vector of variables to adjust for (default: NULL)
strata: Character vector of stratification variables (default: NULL)
link: Character string specifying link function: "auto", "identity", "log", or "logit" (default: "auto")
alpha: Significance level for confidence intervals (default: 0.05)
boundary_method: Method for handling boundary cases: "auto", "profile", "bootstrap", "wald" (default: "auto")
verbose: Logical indicating whether to print diagnostic messages (default: FALSE)

Details

New in Version 0.2.2: NNT Calculation capability

When nnt = TRUE, the function returns Number Needed to Treat (NNT) instead of risk differences. NNT represents the number of individuals that need to be treated to prevent one additional adverse outcome. NNT is calculated as 1/|RD| and confidence intervals are transformed using the delta method. NNT is undefined when RD = 0 and is reported as Inf when |RD| < 0.001. For harmful exposures (RD > 0), this represents Number Needed to Harm (NNH).

References

Donoghoe MW, Marschner IC (2018). "logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model." Journal of Statistical Software, 86(9), 1-22. doi:10.18637/jss.v086.i09

Marschner IC, Gillett AC (2012). "Relative Risk Regression: Reliable and Flexible Methods for Log-Binomial Models." Biostatistics, 13(1), 179-192.

Venzon DJ, Moolgavkar SH (1988). "A Method for Computing Profile-Likelihood-Based Confidence Intervals." Journal of the Royal Statistical Society, 37(1), 87-94.

Rothman KJ, Greenland S, Lash TL (2008). Modern Epidemiology, 3rd edition. Lippincott Williams & Wilkins.

Examples

Run this code

# Simple risk difference
data(cachar_sample)
rd_simple <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut"
)
print(rd_simple)

# Age-adjusted risk difference
rd_adjusted <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  adjust_vars = "age"
)
print(rd_adjusted)

# Stratified analysis with enhanced error checking and boundary detection
rd_stratified <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  strata = "residence",
  verbose = TRUE  # See diagnostic messages and boundary detection
)
print(rd_stratified)

# Check for boundary cases
if (any(rd_stratified$on_boundary)) {
  cat("Boundary cases detected!\n")
  boundary_rows <- which(rd_stratified$on_boundary)
  for (i in boundary_rows) {
    cat("Row", i, ":", rd_stratified$boundary_type[i], "\n")
  }
}

# Force profile likelihood CIs for enhanced robustness
rd_profile <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  boundary_method = "profile"
)

# Calculate Number Needed to Treat instead of risk difference
data(cachar_sample)
nnt_result <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "smoking",
  nnt = TRUE
)
print(nnt_result)

# NNT with adjustment variables
nnt_adjusted <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "smoking",
  adjust_vars = "age",
  nnt = TRUE
)

Run the code above in your browser using DataLab