Learn R Programming

riskdiff (version 0.3.0)

calc_risk_diff: Calculate Risk Differences with Robust Model Fitting and Boundary Detection

Description

Calculates risk differences (or prevalence differences for cross-sectional data) using generalized linear models with identity, log, or logit links. Version 0.2.1 includes enhanced boundary detection, robust confidence intervals, and improved data quality validation to prevent extreme confidence intervals in stratified analyses.

The function addresses common convergence issues with identity link binomial GLMs by implementing a fallback strategy across multiple link functions, similar to approaches described in Donoghoe & Marschner (2018) for relative risk regression.

Usage

calc_risk_diff(
  data,
  outcome,
  exposure,
  nnt = FALSE,
  adjust_vars = NULL,
  strata = NULL,
  link = "auto",
  alpha = 0.05,
  boundary_method = "auto",
  verbose = FALSE
)

Value

A tibble of class "riskdiff_result" containing the following columns:

exposure_var

Character. Name of exposure variable analyzed

rd

Numeric. Risk difference estimate OR Number Needed to Treat if nnt=TRUE (see Details)

ci_lower

Numeric. Lower bound of confidence interval (RD scale or NNT scale)

ci_upper

Numeric. Upper bound of confidence interval (RD scale or NNT scale)

p_value

Numeric. P-value for test of null hypothesis (risk difference = 0)

model_type

Character. Link function successfully used ("identity", "log", "logit", or error type)

n_obs

Integer. Number of observations used in analysis

on_boundary

Logical. TRUE if MLE is on parameter space boundary

boundary_type

Character. Type of boundary: "none", "upper_bound", "lower_bound", "separation", "both_bounds"

boundary_warning

Character. Warning message for boundary cases (if any)

ci_method

Character. Method used for confidence intervals ("wald", "profile", "bootstrap")

...

Additional columns for stratification variables if specified

The returned object has attributes including the original function call and alpha level used. Risk differences are on the probability scale where 0.05 represents a 5 percentage point difference.

Arguments

data

A data frame containing all necessary variables

outcome

Character string naming the binary outcome variable (must be 0/1 or logical)

exposure

Character string naming the exposure variable of interest

nnt

Logical indicating whether to return Number Needed to Treat instead of risk difference (default: FALSE)

adjust_vars

Character vector of variables to adjust for (default: NULL)

strata

Character vector of stratification variables (default: NULL)

link

Character string specifying link function: "auto", "identity", "log", or "logit" (default: "auto")

alpha

Significance level for confidence intervals (default: 0.05)

boundary_method

Method for handling boundary cases: "auto", "profile", "bootstrap", "wald" (default: "auto")

verbose

Logical indicating whether to print diagnostic messages (default: FALSE)

Details

New in Version 0.2.2: NNT Calculation capability

When nnt = TRUE, the function returns Number Needed to Treat (NNT) instead of risk differences. NNT represents the number of individuals that need to be treated to prevent one additional adverse outcome. NNT is calculated as 1/|RD| and confidence intervals are transformed using the delta method. NNT is undefined when RD = 0 and is reported as Inf when |RD| < 0.001. For harmful exposures (RD > 0), this represents Number Needed to Harm (NNH).

References

Donoghoe MW, Marschner IC (2018). "logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model." Journal of Statistical Software, 86(9), 1-22. doi:10.18637/jss.v086.i09

Marschner IC, Gillett AC (2012). "Relative Risk Regression: Reliable and Flexible Methods for Log-Binomial Models." Biostatistics, 13(1), 179-192.

Venzon DJ, Moolgavkar SH (1988). "A Method for Computing Profile-Likelihood-Based Confidence Intervals." Journal of the Royal Statistical Society, 37(1), 87-94.

Rothman KJ, Greenland S, Lash TL (2008). Modern Epidemiology, 3rd edition. Lippincott Williams & Wilkins.

Examples

Run this code
# Simple risk difference
data(cachar_sample)
rd_simple <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut"
)
print(rd_simple)

# Age-adjusted risk difference
rd_adjusted <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  adjust_vars = "age"
)
print(rd_adjusted)

# Stratified analysis with enhanced error checking and boundary detection
rd_stratified <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  strata = "residence",
  verbose = TRUE  # See diagnostic messages and boundary detection
)
print(rd_stratified)

# Check for boundary cases
if (any(rd_stratified$on_boundary)) {
  cat("Boundary cases detected!\n")
  boundary_rows <- which(rd_stratified$on_boundary)
  for (i in boundary_rows) {
    cat("Row", i, ":", rd_stratified$boundary_type[i], "\n")
  }
}

# Force profile likelihood CIs for enhanced robustness
rd_profile <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "areca_nut",
  boundary_method = "profile"
)

# Calculate Number Needed to Treat instead of risk difference
data(cachar_sample)
nnt_result <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "smoking",
  nnt = TRUE
)
print(nnt_result)

# NNT with adjustment variables
nnt_adjusted <- calc_risk_diff(
  data = cachar_sample,
  outcome = "abnormal_screen",
  exposure = "smoking",
  adjust_vars = "age",
  nnt = TRUE
)

Run the code above in your browser using DataLab