Learn R Programming

climatehealth (version 1.0.0)

air_pollution_do_analysis: Comprehensive Air Pollution Analysis Pipeline

Description

Master function that runs the complete air pollution analysis including data loading, preprocessing (including lags), modeling, plotting, attribution calculations vs reference standards, power analysis and descriptive statistics

Usage

air_pollution_do_analysis(
  data_path,
  date_col = "date",
  region_col = "region",
  pm25_col = "pm25",
  deaths_col = "deaths",
  population_col = "population",
  humidity_col = "humidity",
  precipitation_col = "precipitation",
  tmax_col = "tmax",
  wind_speed_col = "wind_speed",
  categorical_others = NULL,
  continuous_others = NULL,
  Categorical_Others = NULL,
  Continuous_Others = NULL,
  max_lag = 14L,
  df_seasonal = 6,
  family = "quasipoisson",
  reference_standards = list(list(value = 15, name = "WHO")),
  output_dir = "air_pollution_results",
  save_outputs = TRUE,
  run_descriptive = TRUE,
  run_power = TRUE,
  moving_average_window = 3L,
  include_national = TRUE,
  years_filter = NULL,
  regions_filter = NULL,
  attr_thr = 95,
  plot_corr_matrix = TRUE,
  correlation_method = "pearson",
  plot_dist = TRUE,
  plot_na_counts = TRUE,
  plot_scatter = TRUE,
  plot_box = TRUE,
  plot_seasonal = TRUE,
  plot_regional = TRUE,
  plot_total = TRUE,
  detect_outliers = TRUE,
  calculate_rate = FALSE
)

Value

List containing:

data

Processed data with lag variables

meta_analysis

Meta-analysis results with AF/AN calculations

lag_analysis

Lag-specific analysis results

distributed_lag_analysis

Distributed lag model results (if requested)

plots

List of generated plots (forest, lags, distributed lags)

power_list

A list containing power information by area

exposure_response_plots

Exposure-response plots for each reference standard (if requested)

reference_specific_af_an

AF/AN calculations specific to each reference standard (if requested)

descriptive_stats

Summary statistics of key variables

Arguments

data_path

Character. Path to CSV data file

date_col

Character. Name of date column

region_col

Character. Name of region column

pm25_col

Character. Name of PM2.5 column

deaths_col

Character. Name of deaths column

population_col

Character. Name of the population column.

humidity_col

Character. Name of humidity column

precipitation_col

Character. Name of precipitation column

tmax_col

Character. Name of temperature column

wind_speed_col

Character. Name of wind speed column

categorical_others

Optional character vector. Names of additional categorical variables.

continuous_others

Optional character vector. Names of additional continuous variables (e.g., "tmean")

Categorical_Others

Deprecated alias for categorical_others.

Continuous_Others

Deprecated alias for continuous_others.

max_lag

Integer. Maximum lag days. Defaults to 14.

df_seasonal

Integer. Degrees of freedom for seasonal spline. Default 6.

family

Character. Character. Probability distribution for the outcome variable. Options include "quasipoisson" (default: "quasipoisson")

reference_standards

List of reference standards, each with "PM2.5 value" and "name of of standard (e.g. WHO)"

output_dir

Directory to save outputs

save_outputs

Logical. Whether to save outputs

run_descriptive

Logical. Whether to run descriptive statistics

run_power

Logical. Whether to run power analysis

moving_average_window

Integer. Window for moving average in descriptive stats

include_national

Logical. Whether to include national results in plots. Default TRUE.

years_filter

Optional numeric vector of years to include (e.g., c(2020, 2021, 2022)). It is recommended to filter for at least 3 consecutive years for a minimum considerable time series

regions_filter

Optional character vector of regions to include

attr_thr

Numeric (0-100). Percentile threshold used in power analysis to evaluate attribution detectability. Default 95.

plot_corr_matrix

Logical. Plot correlation matrix. Default TRUE.

correlation_method

Character. Correlation method for corr matrix (e.g.,"pearson", "spearman"). Default "pearson".

plot_dist

Logical. Plot distributions (hist/density) for key variables. Default TRUE.

plot_na_counts

Logical. Plot missingness/NA counts. Default TRUE.

plot_scatter

Logical. Plot scatter plots for key pairs. Default TRUE.

plot_box

Logical. Plot boxplots by region/season where applicable. Default TRUE.

plot_seasonal

Logical. Plot seasonal summaries. Default TRUE.

plot_regional

Logical. Plot regional summaries. Default TRUE.

plot_total

Logical. Plot overall totals where relevant. Default TRUE.

detect_outliers

Logical. Flag potential outliers in descriptive workflow. Default TRUE.

calculate_rate

Logical. Whether to calculate rate variables during descriptive stats (e.g., deaths per population). Default FALSE

Examples

Run this code
# \donttest{
example_data <- data.frame(
  date = seq.Date(as.Date("2020-01-01"), by = "day", length.out = 180),
  province = "Example Province",
  pm25 = stats::runif(180, 8, 35),
  deaths = stats::rpois(180, lambda = 5),
  population = 500000,
  humidity = stats::runif(180, 40, 90),
  precipitation = stats::runif(180, 0, 20),
  tmax = stats::runif(180, 18, 35),
  wind_speed = stats::runif(180, 1, 8)
)
example_path <- tempfile(fileext = ".csv")
utils::write.csv(example_data, example_path, row.names = FALSE)

results <- air_pollution_do_analysis(
  data_path = example_path,
  date_col = "date",
  region_col = "province",
  pm25_col = "pm25",
  deaths_col = "deaths",
  population_col = "population",
  humidity_col = "humidity",
  precipitation_col = "precipitation",
  tmax_col = "tmax",
  wind_speed_col = "wind_speed",
  continuous_others = NULL,
  max_lag = 7L,
  df_seasonal = 4,
  family = "quasipoisson",
  reference_standards = list(list(value = 15, name = "WHO")),
  years_filter = NULL,
  regions_filter = NULL,
  include_national = FALSE,
  output_dir = tempdir(),
  save_outputs = FALSE,
  run_descriptive = FALSE,
  run_power = FALSE,
  moving_average_window = 3L,
  attr_thr = 95,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  detect_outliers = FALSE,
  calculate_rate = FALSE
)
# }

Run the code above in your browser using DataLab