This function applies a series of cleaning and normalization steps to strings representing dates. It is intended for use before parsing dates into a YMD (year–month–day) format. The function standardizes month names, trims whitespace, removes invalid characters, and handles strings that contain a letter "T" (common in timestamp formats).
clean_date(df_column)A character vector of cleaned date strings, with a maximum length of 12 characters, trimmed of whitespace, and with any timestamp-like "T" components removed when appropriate.
A character vector or data frame column containing raw date-like strings to be cleaned.
Lukasz Andrzejewski
The processing includes:
Converting full month names to abbreviated forms
(via get_abbreviated_month_name()).
Limiting the string to the first 12 characters
(via get_up_to_12_char()).
Removing non-date characters
(via remove_no_date_characters()).
Trimming whitespace at the start and end of the string.
Handling timestamps or strings containing the letter "T":
If "T" appears exactly once and the string does not contain "August" or "October", keep only the substring before "T".
If "T" appears multiple times, remove the unnecessary trailing
part using remove_unnecessary_part_of_date().
If the first token of the string (separated by a space) is longer than four characters, return only that first token.
clean_date(c("2024-01-10T15:30:00", "2024 AUGUST 12", "20250101"))
Run the code above in your browser using DataLab