Learn R Programming

starling (version 0.6.5)

preening: Prettification of infectious diseases datasets

Description

Prettifies your dataset in preparation for data exploration and presenting tables. Adds variable labels and creates a series of age and time categories for analysis. Just list the dataframe and it let it clean your variables and create exploratory variables. Use it as late in the workflow as possible, but, can be used at anytime.

Classic workflow would be:

  1. clean_the_nest to clean and prep data for linkage. Pay close attention to your linkage variables (letternames, date of birth, medicare number, gender and/or postcode), and ensure all dates are formatted as dates.

  2. murmuration to link cases to vaccination data (named here "c2v").

  3. murmuration to link c2v to hospitalization data (named here c2v2h). Of note, you can skip linking the vaccination dataset.

  4. preening to prettify the dataframe prepping it for exploration, analysis and presentation. Great to use with gtsummary::tbl_summary().

Usage

preening(
  df,
  create_age_categories = TRUE,
  create_temporal_vars = TRUE,
  calculate_age = TRUE,
  age_reference_date = NULL
)

Value

The output is a dataframe with variable labels (useful for making pretty tables and graphics), and creates several age categories and time categories (month-year, quarter-year etc.)

Arguments

df

The dataset as a dataframe, which can be a case notifications dataset (infections), hospital admissions or vaccination dataset.

create_age_categories

Logical. If TRUE (default), creates 21 standardized age category variables. Requires an 'age' variable in the dataset.

create_temporal_vars

Logical. If TRUE (default), creates temporal variables (ISO weeks, quarters, months) for date columns.

calculate_age

Logical. If TRUE (default), attempts to calculate age from dob if age variable is missing.

age_reference_date

Character. Column name to use as reference date for age calculation if age is missing. If NULL (default), uses first available from: event_date, onset_date, admission_date, first_vax_date, last_vax_date, vax_date_*.

Details

This function enhances infectious disease datasets by:

  • Adding descriptive variable labels for cleaner tables and graphics

  • Creating comprehensive temporal variables (ISO weeks, quarters, months) from date fields

  • Generating 21 standardized age category variables for flexible analysis

  • Calculating age from date of birth if not already present

  • Adding useful derived variables for epidemiological analysis

IMPORTANT - Date Format Requirements:

All date columns MUST be in R's Date format before using this function. The function expects dates to already be properly formatted and will error with a clear message if they are not.

Common date conversions:

  • From character: data$dob <- as.Date(data$dob, format = "%Y-%m-%d")

  • From character (alternative): data$dob <- lubridate::ymd(data$dob)

  • From Excel dates: data$dob <- as.Date(data$dob, origin = "1899-12-30")

  • Always check: class(data$dob) should return "Date"

If you receive an error like "column must be in Date format", convert your date columns first, then run preening().

Age Categorization: If create_age_categories = TRUE and an 'age' variable exists (or can be calculated), the function creates 21 standardized age category variables with nomenclature age[x]cat where x indicates the number of categories:

age2cat

2 categories: Pediatric vs Adult (<18, 18+)

age3cat

3 categories: Child, Adult, Older Adult (<18, 18-64, 65+)

age4cat

4 categories: Infant/Child, Young Adult, Adult, Older Adult (<5, 5-17, 18-64, 65+)

age5cat

5 categories: Standard public health categories (0-4, 5-17, 18-64, 65-74, 75+)

age6cat

6 categories: Granular infant categories (<1, 1-4, 5-17, 18-64, 65-74, 75+)

age7cat

7 categories: Fine pediatric cuts (<1, 1, 2-4, 5-11, 12-17, 18-64, 65+)

age8cat

8 categories: Infant subcategories (<3mo, 3-5mo, 6-11mo, 1-4, 5-17, 18-64, 65-74, 75+)

age9cat

9 categories: Monthly infant categories (<1mo, 1mo, 2-5mo, 6-11mo, 1-4, 5-17, 18-64, 65-74, 75+)

age10cat

10 categories: Decade bands (0-4, 5-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+)

age11cat

11 categories: Fine older adult categories (0-4, 5-17, 18-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99, 100+)

age12cat

12 categories: Detailed pediatric + adult decades (<1, 1-4, 5-9, 10-14, 15-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+)

age13cat

13 categories: Very fine infant + standard adult (<1mo, 1mo, 2mo, 3-5mo, 6-11mo, 1, 2-4, 5-11, 12-17, 18-39, 40-64, 65-79, 80+)

age14cat

14 categories: ABS-like with fine elderly (0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-84, 85-89, 90+)

age15cat

15 categories: Vaccine schedule aligned (<2mo, 2-3mo, 4-5mo, 6-11mo, 1, 2-3, 4, 5-11, 12-17, 18-49, 50-64, 65-74, 75-84, 85-94, 95+)

age16cat

16 categories: Granular pediatric + 10-year adult bands (<1, 1, 2, 3, 4, 5-9, 10-14, 15-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90+)

age17cat

17 categories: WHO/UNICEF standard with extensions (<1mo, 1-5mo, 6-11mo, 1, 2-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90+)

age18cat

18 categories: Standard 5-year bands (census/ABS style) (0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85+)

age19cat

19 categories: Extended 5-year bands with fine elderly (0-4, 5-9, ..., 80-84, 85-89, 90+)

age20cat

20 categories: Monthly up to 12 months + standard thereafter (<1mo, 1mo, 2mo, 3mo, 4mo, 5mo, 6mo, 7mo, 8mo, 9mo, 10mo, 11mo, 1-4, 5-17, 18-39, 40-64, 65-74, 75-84, 85-94, 95+)

age21cat

21 categories: Comprehensive life course categories (<1mo, 1-2mo, 3-5mo, 6-11mo, 1, 2-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-74, 75-84, 85+)