wildfire_do_analysis: This is full analysis pipeline to analyse the impact of wildfire-related PM2.5 on a health outcome.

Description

Runs full analysis pipeline for analysis of the impact of wildfire-related PM2.5 on a health outcome using time stratified case-crossover approach with conditional quasi-Poisson regression model. This function generates relative risk of the mortality associated to wildfire-related PM2.5 as well as attributable numbers, rates and fractions of health outcome. Model validation statistics are also provided.

Usage

wildfire_do_analysis(
  health_path,
  join_wildfire_data = FALSE,
  ncdf_path = NULL,
  shp_path = NULL,
  date_col,
  region_col,
  shape_region_col = NULL,
  mean_temperature_col,
  health_outcome_col,
  population_col = NULL,
  rh_col = NULL,
  wind_speed_col = NULL,
  pm_2_5_col = NULL,
  wildfire_lag = 3,
  temperature_lag = 1,
  spline_temperature_lag = 0,
  spline_temperature_degrees_freedom = 6,
  predictors_vif = NULL,
  calc_relative_risk_by_region = FALSE,
  scale_factor_wildfire_pm = 10,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  create_run_subdir = FALSE,
  print_vif = FALSE,
  print_model_summaries = FALSE
)

Value

rr_results A dataframe with relative risk estimates and confidence intervals for each region.
rr_pm A dataframe of relative risk estimates for wildfire-specific PM2.5 exposure across regions as PM values changes.
af_an_results A dataframe containing attributable fractions, attributable numbers and deaths per 100k population for each region
annual_af_an_resultsA dataframe containing annual attributable numbers and fractions for each region
calculate_qaic A dataframe of QAIC and dispersion metrics for each model combination and geography.
check_wildfire_vif A dataframe containing Variance inflation factors for each independent variables by region.

Arguments

health_path: Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region. If this does not include a column with wildfire-related PM2.5, use join_wildfire_data = TRUE to join these data.
join_wildfire_data: Boolean. If TRUE, a daily time series of wildfire-related PM2.5 concentration is joined to the health data. If FALSE, the data set is loaded without any additional joins. Defaults to FALSE.
ncdf_path: Path to a NetCDF file containing a daily time series of gridded wildfire-related PM2.5 concentration data.
shp_path: Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5
date_col: Character. Name of the column in the dataframe that contains the date.
region_col: Character. Name of the column in the dataframe that contains the region names.
shape_region_col: Character. Name of the column in the shapefile dataframe that contains the region names.
mean_temperature_col: Character. Name of the column in the dataframe that contains the mean temperature column.
health_outcome_col: Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions)
population_col: Character. Name of the column in the dataframe that contains the population data. Defaults to NULL. This is only required when requesting region-level AF/AN outputs and no pop column is already present in the input data.
rh_col: Character. Name of the column containing relative humidity values. Defaults to NULL.
wind_speed_col: Character. Name of the column containing wind speed. Defaults to NULL.
pm_2_5_col: Character. The name of the column containing PM2.5 values in micrograms. This is only required if health data isn't joined. Defaults to NULL.
wildfire_lag: Integer. The number of days for which to calculate the lags for wildfire PM2.5. Default is 3.
temperature_lag: Integer. The number of days for which to calculate the lags for temperature. Default is 1.
spline_temperature_lag: Integer. The number of days of lag in the temperature variable from which to generate splines. Default is 0 (unlagged temperature variable).
spline_temperature_degrees_freedom: Integer. Degrees of freedom for the spline(s).
predictors_vif: Character vector with each of the predictors to include in the model. Must contain at least 2 variables. Defaults to NULL.
calc_relative_risk_by_region: Bool. Whether to calculate Relative Risk by region. Default: FALSE
scale_factor_wildfire_pm: Numeric. The value to divide the wildfire PM2.5 concentration variables by for alternative interpretation of outputs. Corresponds to the unit increase in wildfire PM2.5 to give the model estimates and relative risks (e.g. scale_factor = 10 corresponds to estimates and relative risks representing impacts of a 10 unit increase in wildfire PM2.5). Setting this parameter to 0 or 1 leaves the variable unscaled.
save_fig: Boolean. Whether to save the plot as an output.
save_csv: Boolean. Whether to save the results as a CSV
output_folder_path: Path. Path to folder where plots and/or CSV should be saved.
create_run_subdir: Boolean. If TRUE, create a timestamped subdirectory under output_folder_path for this run's outputs. Defaults to FALSE.
print_vif: Bool, whether or not to print VIF (variance inflation factor) for each predictor. Defaults to FALSE.
print_model_summaries: Bool. Whether to print the model summaries to console. Defaults to FALSE.

Details

This analysis pipeline requires a daily time series with mean wildfire PM2.5, mean temperature and health outcome (all-cause mortality, respiratory, cardiovascular, hospital admissions etc) with population values as a minimum. This is then processed using a time stratified case crossover approach with conditional Poisson case-crossover analysis and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.

The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for sensitivity analysis.

Model validation testing is provided as a standard output from the pipeline so a user can assess the quality of the model. Additionally, users can incorporate extra independent variables-such as relative humidity or wind speed-directly into the model for enhanced analysis.

Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published tools:::Rd_expr_doi("10.5281/zenodo.14052184").

References

Brown A, Soutter E, Ingole V., Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Wildfires: introduction. Zenodo; 2024. Available from: https://zenodo.org/records/14052184
Hänninen R, Sofiev M, Uppstu A, Kouznetsov R.Daily surface concentration of fire related PM2.5 for 2003-2023, modelled by SILAM CTM when using the MODIS satellite data for the fire radiative power. Finnish Meteorological Institute; 2024. Available from: tools:::Rd_expr_doi("10.57707/fmi-b2share.d1cac971b3224d438d5304e945e9f16c")
GADM. Database for Global Administrative Areas.Available from: https://gadm.org/download_country.html
Tobias A, Kim Y, Madaniyazi L. Time-stratified case-crossover studies for aggregated data in environmental epidemiology: a tutorial. Int J Epidemiol. 2024;53(2). Available from: tools:::Rd_expr_doi("10.1093/ije/dyae020")
Wu Y, Li S, Guo Y. Space-Time-Stratified Case-Crossover Design in Environmental Epidemiology Study. Heal Data Sci. 2021; Available from: tools:::Rd_expr_doi("10.34133/2021/9870798")

Examples

Run this code

# \donttest{
example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
  region = "Example Region",
  death = stats::rpois(180, lambda = 4),
  population = 400000,
  tmean = stats::runif(180, 10, 35),
  mean_PM = stats::runif(180, 0, 25)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

wildfire_do_analysis(
health_path = example_path,
join_wildfire_data = FALSE,
ncdf_path = NULL,
shp_path = NULL,
date_col = "date",
region_col = "region",
shape_region_col = NULL,
mean_temperature_col = "tmean",
health_outcome_col = "death",
population_col = "population",
rh_col = NULL,
wind_speed_col = NULL,
pm_2_5_col = " mean_PM ",
wildfire_lag = 3,
temperature_lag = 1,
spline_temperature_lag = 0,
spline_temperature_degrees_freedom = 4,
predictors_vif = NULL,
calc_relative_risk_by_region = FALSE,
scale_factor_wildfire_pm = 10,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = tempdir(),
create_run_subdir = FALSE,
print_vif = FALSE,
print_model_summaries = FALSE)
# }

Run the code above in your browser using DataLab