Prettifies your dataset in preparation for data exploration and presenting tables. Adds variable labels and creates a series of age and time categories for analysis. Just list the dataframe and it let it clean your variables and create exploratory variables. Use it as late in the workflow as possible, but, can be used at anytime.
Classic workflow would be:
clean_the_nest to clean and prep data for linkage. Pay close attention to your linkage variables (letternames, date of birth, medicare number, gender and/or postcode), and ensure all dates are formatted as dates.
murmuration to link cases to vaccination data (named here "c2v").
murmuration to link c2v to hospitalization data (named here c2v2h). Of note, you can skip linking the vaccination dataset.
preening to prettify the dataframe prepping it for exploration, analysis and presentation. Great to use with gtsummary::tbl_summary().
preening(
df,
create_age_categories = TRUE,
create_temporal_vars = TRUE,
calculate_age = TRUE,
age_reference_date = NULL
)The output is a dataframe with variable labels (useful for making pretty tables and graphics), and creates several age categories and time categories (month-year, quarter-year etc.)
The dataset as a dataframe, which can be a case notifications dataset (infections), hospital admissions or vaccination dataset.
Logical. If TRUE (default), creates 21 standardized age category variables. Requires an 'age' variable in the dataset.
Logical. If TRUE (default), creates temporal variables (ISO weeks, quarters, months) for date columns.
Logical. If TRUE (default), attempts to calculate age from dob if age variable is missing.
Character. Column name to use as reference date for age calculation if age is missing. If NULL (default), uses first available from: event_date, onset_date, admission_date, first_vax_date, last_vax_date, vax_date_*.
This function enhances infectious disease datasets by:
Adding descriptive variable labels for cleaner tables and graphics
Creating comprehensive temporal variables (ISO weeks, quarters, months) from date fields
Generating 21 standardized age category variables for flexible analysis
Calculating age from date of birth if not already present
Adding useful derived variables for epidemiological analysis
IMPORTANT - Date Format Requirements:
All date columns MUST be in R's Date format before using this function. The function expects dates to already be properly formatted and will error with a clear message if they are not.
Common date conversions:
From character: data$dob <- as.Date(data$dob, format = "%Y-%m-%d")
From character (alternative): data$dob <- lubridate::ymd(data$dob)
From Excel dates: data$dob <- as.Date(data$dob, origin = "1899-12-30")
Always check: class(data$dob) should return "Date"
If you receive an error like "column must be in Date format", convert your date columns first, then run preening().
Age Categorization: If create_age_categories = TRUE and an 'age' variable exists (or can be calculated), the function creates 21 standardized age category variables with nomenclature age[x]cat where x indicates the number of categories:
2 categories: Pediatric vs Adult (<18, 18+)
3 categories: Child, Adult, Older Adult (<18, 18-64, 65+)
4 categories: Infant/Child, Young Adult, Adult, Older Adult (<5, 5-17, 18-64, 65+)
5 categories: Standard public health categories (0-4, 5-17, 18-64, 65-74, 75+)
6 categories: Granular infant categories (<1, 1-4, 5-17, 18-64, 65-74, 75+)
7 categories: Fine pediatric cuts (<1, 1, 2-4, 5-11, 12-17, 18-64, 65+)
8 categories: Infant subcategories (<3mo, 3-5mo, 6-11mo, 1-4, 5-17, 18-64, 65-74, 75+)
9 categories: Monthly infant categories (<1mo, 1mo, 2-5mo, 6-11mo, 1-4, 5-17, 18-64, 65-74, 75+)
10 categories: Decade bands (0-4, 5-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+)
11 categories: Fine older adult categories (0-4, 5-17, 18-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99, 100+)
12 categories: Detailed pediatric + adult decades (<1, 1-4, 5-9, 10-14, 15-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+)
13 categories: Very fine infant + standard adult (<1mo, 1mo, 2mo, 3-5mo, 6-11mo, 1, 2-4, 5-11, 12-17, 18-39, 40-64, 65-79, 80+)
14 categories: ABS-like with fine elderly (0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-84, 85-89, 90+)
15 categories: Vaccine schedule aligned (<2mo, 2-3mo, 4-5mo, 6-11mo, 1, 2-3, 4, 5-11, 12-17, 18-49, 50-64, 65-74, 75-84, 85-94, 95+)
16 categories: Granular pediatric + 10-year adult bands (<1, 1, 2, 3, 4, 5-9, 10-14, 15-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90+)
17 categories: WHO/UNICEF standard with extensions (<1mo, 1-5mo, 6-11mo, 1, 2-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90+)
18 categories: Standard 5-year bands (census/ABS style) (0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85+)
19 categories: Extended 5-year bands with fine elderly (0-4, 5-9, ..., 80-84, 85-89, 90+)
20 categories: Monthly up to 12 months + standard thereafter (<1mo, 1mo, 2mo, 3mo, 4mo, 5mo, 6mo, 7mo, 8mo, 9mo, 10mo, 11mo, 1-4, 5-17, 18-39, 40-64, 65-74, 75-84, 85-94, 95+)
21 categories: Comprehensive life course categories (<1mo, 1-2mo, 3-5mo, 6-11mo, 1, 2-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-74, 75-84, 85+)