Learn R Programming

climatehealth (version 1.0.0)

wildfire_do_analysis: This is full analysis pipeline to analyse the impact of wildfire-related PM2.5 on a health outcome.

Description

Runs full analysis pipeline for analysis of the impact of wildfire-related PM2.5 on a health outcome using time stratified case-crossover approach with conditional quasi-Poisson regression model. This function generates relative risk of the mortality associated to wildfire-related PM2.5 as well as attributable numbers, rates and fractions of health outcome. Model validation statistics are also provided.

Usage

wildfire_do_analysis(
  health_path,
  join_wildfire_data = FALSE,
  ncdf_path = NULL,
  shp_path = NULL,
  date_col,
  region_col,
  shape_region_col = NULL,
  mean_temperature_col,
  health_outcome_col,
  population_col = NULL,
  rh_col = NULL,
  wind_speed_col = NULL,
  pm_2_5_col = NULL,
  wildfire_lag = 3,
  temperature_lag = 1,
  spline_temperature_lag = 0,
  spline_temperature_degrees_freedom = 6,
  predictors_vif = NULL,
  calc_relative_risk_by_region = FALSE,
  scale_factor_wildfire_pm = 10,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  create_run_subdir = FALSE,
  print_vif = FALSE,
  print_model_summaries = FALSE
)

Value

  • rr_results A dataframe with relative risk estimates and confidence intervals for each region.

  • rr_pm A dataframe of relative risk estimates for wildfire-specific PM2.5 exposure across regions as PM values changes.

  • af_an_results A dataframe containing attributable fractions, attributable numbers and deaths per 100k population for each region

  • annual_af_an_resultsA dataframe containing annual attributable numbers and fractions for each region

  • calculate_qaic A dataframe of QAIC and dispersion metrics for each model combination and geography.

  • check_wildfire_vif A dataframe containing Variance inflation factors for each independent variables by region.

Arguments

health_path

Path to a CSV file containing a daily time series of data for a particular health outcome, which may be disaggregated by region. If this does not include a column with wildfire-related PM2.5, use join_wildfire_data = TRUE to join these data.

join_wildfire_data

Boolean. If TRUE, a daily time series of wildfire-related PM2.5 concentration is joined to the health data. If FALSE, the data set is loaded without any additional joins. Defaults to FALSE.

ncdf_path

Path to a NetCDF file containing a daily time series of gridded wildfire-related PM2.5 concentration data.

shp_path

Path to a shapefile .shp of the geographical boundaries for which to extract mean values of wildfire-related PM2.5

date_col

Character. Name of the column in the dataframe that contains the date.

region_col

Character. Name of the column in the dataframe that contains the region names.

shape_region_col

Character. Name of the column in the shapefile dataframe that contains the region names.

mean_temperature_col

Character. Name of the column in the dataframe that contains the mean temperature column.

health_outcome_col

Character. Name of the column in the dataframe that contains the health outcome count column (e.g. number of deaths, hospital admissions)

population_col

Character. Name of the column in the dataframe that contains the population data. Defaults to NULL. This is only required when requesting region-level AF/AN outputs and no pop column is already present in the input data.

rh_col

Character. Name of the column containing relative humidity values. Defaults to NULL.

wind_speed_col

Character. Name of the column containing wind speed. Defaults to NULL.

pm_2_5_col

Character. The name of the column containing PM2.5 values in micrograms. This is only required if health data isn't joined. Defaults to NULL.

wildfire_lag

Integer. The number of days for which to calculate the lags for wildfire PM2.5. Default is 3.

temperature_lag

Integer. The number of days for which to calculate the lags for temperature. Default is 1.

spline_temperature_lag

Integer. The number of days of lag in the temperature variable from which to generate splines. Default is 0 (unlagged temperature variable).

spline_temperature_degrees_freedom

Integer. Degrees of freedom for the spline(s).

predictors_vif

Character vector with each of the predictors to include in the model. Must contain at least 2 variables. Defaults to NULL.

calc_relative_risk_by_region

Bool. Whether to calculate Relative Risk by region. Default: FALSE

scale_factor_wildfire_pm

Numeric. The value to divide the wildfire PM2.5 concentration variables by for alternative interpretation of outputs. Corresponds to the unit increase in wildfire PM2.5 to give the model estimates and relative risks (e.g. scale_factor = 10 corresponds to estimates and relative risks representing impacts of a 10 unit increase in wildfire PM2.5). Setting this parameter to 0 or 1 leaves the variable unscaled.

save_fig

Boolean. Whether to save the plot as an output.

save_csv

Boolean. Whether to save the results as a CSV

output_folder_path

Path. Path to folder where plots and/or CSV should be saved.

create_run_subdir

Boolean. If TRUE, create a timestamped subdirectory under output_folder_path for this run's outputs. Defaults to FALSE.

print_vif

Bool, whether or not to print VIF (variance inflation factor) for each predictor. Defaults to FALSE.

print_model_summaries

Bool. Whether to print the model summaries to console. Defaults to FALSE.

Details

This analysis pipeline requires a daily time series with mean wildfire PM2.5, mean temperature and health outcome (all-cause mortality, respiratory, cardiovascular, hospital admissions etc) with population values as a minimum. This is then processed using a time stratified case crossover approach with conditional Poisson case-crossover analysis and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.

The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for sensitivity analysis.

Model validation testing is provided as a standard output from the pipeline so a user can assess the quality of the model. Additionally, users can incorporate extra independent variables-such as relative humidity or wind speed-directly into the model for enhanced analysis.

Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published tools:::Rd_expr_doi("10.5281/zenodo.14052184").

References

  1. Brown A, Soutter E, Ingole V., Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Wildfires: introduction. Zenodo; 2024. Available from: https://zenodo.org/records/14052184

  2. Hänninen R, Sofiev M, Uppstu A, Kouznetsov R.Daily surface concentration of fire related PM2.5 for 2003-2023, modelled by SILAM CTM when using the MODIS satellite data for the fire radiative power. Finnish Meteorological Institute; 2024. Available from: tools:::Rd_expr_doi("10.57707/fmi-b2share.d1cac971b3224d438d5304e945e9f16c")

  3. GADM. Database for Global Administrative Areas.Available from: https://gadm.org/download_country.html

  4. Tobias A, Kim Y, Madaniyazi L. Time-stratified case-crossover studies for aggregated data in environmental epidemiology: a tutorial. Int J Epidemiol. 2024;53(2). Available from: tools:::Rd_expr_doi("10.1093/ije/dyae020")

  5. Wu Y, Li S, Guo Y. Space-Time-Stratified Case-Crossover Design in Environmental Epidemiology Study. Heal Data Sci. 2021; Available from: tools:::Rd_expr_doi("10.34133/2021/9870798")

Examples

Run this code
# \donttest{
example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
  region = "Example Region",
  death = stats::rpois(180, lambda = 4),
  population = 400000,
  tmean = stats::runif(180, 10, 35),
  mean_PM = stats::runif(180, 0, 25)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

wildfire_do_analysis(
health_path = example_path,
join_wildfire_data = FALSE,
ncdf_path = NULL,
shp_path = NULL,
date_col = "date",
region_col = "region",
shape_region_col = NULL,
mean_temperature_col = "tmean",
health_outcome_col = "death",
population_col = "population",
rh_col = NULL,
wind_speed_col = NULL,
pm_2_5_col = " mean_PM ",
wildfire_lag = 3,
temperature_lag = 1,
spline_temperature_lag = 0,
spline_temperature_degrees_freedom = 4,
predictors_vif = NULL,
calc_relative_risk_by_region = FALSE,
scale_factor_wildfire_pm = 10,
save_fig = FALSE,
save_csv = FALSE,
output_folder_path = tempdir(),
create_run_subdir = FALSE,
print_vif = FALSE,
print_model_summaries = FALSE)
# }

Run the code above in your browser using DataLab