temp_mortality_do_analysis: Full analysis for the 'mortality attributable to high and low temperatures' indicator

Description

Runs the full methodology to analyse the impact of high and low temperatures on mortality using a quasi-Poisson time series approach with a distributed lag non-linear model. This function generates the relative risk of the temperature-mortality association as well as attributable numbers, rates and fractions of mortalities to specified temperature thresholds for high and low temperatures. Model validation statistics are also provided.

Usage

temp_mortality_do_analysis(
  data_path,
  date_col,
  region_col,
  temperature_col,
  dependent_col,
  population_col,
  country = "National",
  independent_cols = NULL,
  control_cols = NULL,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(10, 75, 90),
  lagn = 21,
  lagnk = 3,
  dfseas = 8,
  meta_analysis = FALSE,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = NULL,
  seed = NULL
)

Value

qaic_results Dataframe. QAIC and dispersion metrics for each model combination and geography.
qaic_summary Dataframe. Mean QAIC and dispersion metrics for each model combination.
vif_results Dataframe. Variance inflation factors for each independent variables by geography.
vif_summary Dataframe. Mean variance inflation factors for each independent variable.
adf_results Dataframe. ADF test results for each geography.
power_list List. Power information by area.
rr_results Dataframe containing cumulative relative risk and confidence intervals from analysis.
res_attr_tot Dataframe. Total attributable fractions, numbers and rates for each area over the whole time series.
attr_yr_list List. Dataframes containing yearly estimates of attributable fractions, numbers and rates by area.
attr_mth_list List. Dataframes containing total attributable fractions, numbers and rates by calendar month and area.

Arguments

data_path: Path to a csv file containing a daily time series of data for a particular health outcome and climate variables, which may be disaggregated by geography.
date_col: Character. Name of the column in the dataframe containing the date.
region_col: Character. Name of the column in the dataframe that contains the geography name(s).
temperature_col: Character. Name of the column in the dataframe that contains the temperature column.
dependent_col: Character. Name of the column in the dataframe containing the dependent health outcome variable e.g. deaths.
population_col: Character. Name of the column in the dataframe that contains the population estimate per geography.
country: Character. Name of country for national-level estimates. Defaults to 'National'.
independent_cols: List. Additional independent variables to test in model validation as confounders. Defaults to NULL.
control_cols: List. Confounders to include in the final model adjustment. Defaults to NULL.
var_fun: Character. Exposure function for argvar (see dlnm::crossbasis). Defaults to 'bs'.
var_degree: Integer. Degree of the piecewise polynomial for argvar (see dlnm:crossbasis). Defaults to 2 (quadratic).
var_per: Vector. Internal knot positions for argvar (see dlnm::crossbasis). Defaults to c(10, 75, 90).
lagn: Integer. Number of days in the lag period. Defaults to 21. (see dlnm::crossbasis).
lagnk: Integer. Number of knots in lag function. Defaults to 3. (see dlnm::logknots).
dfseas: Integer. Degrees of freedom for seasonality. Defaults to 8.
meta_analysis: Boolean. Whether to perform a meta-analysis. Defaults to FALSE.
attr_thr_high: Integer. Percentile at which to define the high temperature threshold for calculating attributable risk. Defaults to 97.5.
attr_thr_low: Integer. Percentile at which to define the low temperature threshold for calculating attributable risk. Defaults to 2.5.
save_fig: Boolean. Whether to save the plot as an output. Defaults to FALSE.
save_csv: Boolean. Whether to save the results as a CSV. Defaults to FALSE.
output_folder_path: Path to folder where plots and/or CSV should be saved. Defaults to NULL.
seed: Optional integer random seed used when sampling residuals for model validation plots. Defaults to NULL.

Details

This analysis requires a daily time series of temperature and death counts with population values as a minimum. This is then processed using a quasi-Poisson time series regression analysis with a distributed lag non-linear model and optional meta-analysis. Meta-analysis is recommended if the input data is disaggregated by area.

The model parameters have default values, which are recommended to keep as based on existing studies. However, if desired these can be adjusted for if appropriate for the user's context.

Model validation testing is provided as a standard output from the pipeline so a user can assess the quality of the model. If a user has additional independent variables these can be specified as independent_cols and assessed within different model combinations in the outputs of this testing. These can be added in the final model via control_cols. Note, a user should include variables if contextually relevant, and not simply based on model optimisation.

For attributable deaths the default is to use a high temperature threshold, defined as the 97.5th percentile of the temperature distribution over the full time period for each geography. The low temperature thresholds is similarly defined at the 2.5th percentile. These can be adjusted if desired, following review of the relative risk association between temperature and mortality using attr_thr_high or attr_thr_low.

Further details on the input data requirements, methodology, quality information and guidance on interpreting outputs can be found in the accompanying published tools:::Rd_expr_doi("10.5281/zenodo.14865904").

References

Watkins E, Hunt C, Lewis B, Ingole V, Glickman M. Standards for Official Statistics on Climate-Health Interactions (SOSCHI): Mortality attributed to high and low temperatures: methodology. Zenodo; 2026. Available from: tools:::Rd_expr_doi("10.5281/zenodo.14865904")
Gasparrini A, Guo Y, Hashizume M, Lavigne E, Zanobetti A, Schwartz J, et al. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet. 2015 Jul;386(9991):369-75. Available from: https://linkinghub.elsevier.com/retrieve/pii/S0140673614621140
Gasparrini A, Armstrong B. Reducing and meta-analysing estimates from distributed lag non-linear models. BMC Medical Research Methodology. 2013 Jan 9;13:1. Available from: tools:::Rd_expr_doi("10.1186/1471-2288-13-1")
Gasparrini A, Armstrong B, Kenward MG. Multivariate meta-analysis for non-linear and other multi-parameter associations. Statistics in Medicine. 2012 Dec 20;31(29):3821-39. Available from: tools:::Rd_expr_doi("10.1002/sim.5471")

Examples

Run this code

# \donttest{
example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 365),
  region = "Example Region",
  tmean = stats::runif(365, -2, 32),
  deaths = stats::rpois(365, lambda = 8),
  pop = 500000
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

temp_mortality_do_analysis(
  data_path = example_path,
  date_col = "date",
  temperature_col = "tmean",
  dependent_col = "deaths",
  population_col = "pop",
  region_col = "region",
  country = "Example Region",
  meta_analysis = FALSE,
  independent_cols = NULL,
  control_cols = NULL,
  var_fun = "bs",
  var_degree = 2,
  var_per = c(10, 75, 90),
  lagn = 7,
  lagnk = 2,
  dfseas = 4,
  attr_thr_high = 97.5,
  attr_thr_low = 2.5,
  save_fig = FALSE,
  save_csv = FALSE,
  output_folder_path = tempdir()
)
# }

Run the code above in your browser using DataLab