Learn R Programming

climatehealth (version 1.0.0)

run_descriptive_stats: Run generic descriptive statistics and EDA outputs for indicator datasets.

Description

Run generic descriptive statistics and EDA outputs for indicator datasets.

Usage

run_descriptive_stats(
  data,
  output_path,
  aggregation_column = NULL,
  population_col = NULL,
  plot_corr_matrix = FALSE,
  correlation_method = "pearson",
  plot_dist = FALSE,
  plot_ma = FALSE,
  ma_days = 100,
  ma_sides = 1,
  timeseries_col = NULL,
  dependent_col,
  independent_cols,
  units = NULL,
  plot_na_counts = FALSE,
  plot_scatter = FALSE,
  plot_box = FALSE,
  plot_seasonal = FALSE,
  plot_regional = FALSE,
  plot_total = FALSE,
  detect_outliers = FALSE,
  calculate_rate = FALSE,
  run_id = NULL,
  create_base_dir = FALSE
)

Value

A list with base_output_path, run_id, run_output_path, and region_output_paths.

Arguments

data

Dataframe or named list of dataframes. If a dataframe is provided and aggregation_column is passed, data are split by that column.

output_path

Character. Base output directory.

aggregation_column

Character. Column used to aggregate/split data by region.

population_col

Character. The column containing population data.

plot_corr_matrix

Logical. Whether to plot correlation matrix.

correlation_method

Character. Correlation method. One of 'pearson', 'spearman', 'kendall'.

plot_dist

Logical. Whether to plot distribution histograms.

plot_ma

Logical. Whether to plot moving averages over a timeseries.

ma_days

Integer. Number of days to use for moving average.

ma_sides

Integer. Sides to use for moving average (1 or 2).

timeseries_col

Character. Timeseries column used for moving averages and time-based plots.

dependent_col

Character. Dependent variable column.

independent_cols

Character vector. Independent variable columns.

units

Named character vector. Units for variables.

plot_na_counts

Logical. Whether to plot NA counts.

plot_scatter

Logical. Whether to plot scatter plots.

plot_box

Logical. Whether to plot box plots.

plot_seasonal

Logical. Whether to plot seasonal trends.

plot_regional

Logical. Whether to plot regional trends.

plot_total

Logical. Whether to plot total health outcomes by year.

detect_outliers

Logical. Whether to output an outlier table.

calculate_rate

Logical. Whether to plot annual rates per 100k.

run_id

Character. Optional run id. If NULL, a timestamped id is generated.

create_base_dir

Logical. Whether to create output_path if missing.

Examples

Run this code
# \donttest{
df <- data.frame(
  date = as.Date("2024-01-01") + 0:29,
  region = rep(c("A", "B"), each = 15),
  outcome = sample(1:20, 30, replace = TRUE),
  temp = rnorm(30, 25, 3)
)

run_descriptive_stats(
  data = df,
  output_path = tempdir(),
  aggregation_column = "region",
  dependent_col = "outcome",
  independent_cols = c("temp"),
  timeseries_col = "date",
  run_id = NULL
)
# }

Run the code above in your browser using DataLab