This function cleans a character vector or data frame column containing date-like strings by removing all characters that are not needed for parsing or recognizing dates. It preserves:
Digits (0–9)
Letters that appear in any full month name (e.g., "January" → "J, A, N, U, R, Y")
Selected extra allowed characters: space (" "), dash ("-"), slash ("/"), and "k"/"K"
All other characters (symbols, punctuation, letters not in month names) are removed.
remove_no_date_characters(df_column)A character vector of the same length as df_column, with
unwanted characters removed. Only digits, letters from month names,
and selected extra characters are kept.
A character vector (or data frame column) containing date-like strings. Factors will be coerced to character. NA values are preserved.
Lukasz Andrzejewski
The function works as follows:
Converts input to character vector.
Generates the set of letters present in all English month names (case-insensitive).
Constructs a regex pattern to match all characters that are NOT digits, allowed letters, or allowed extra symbols.
Uses stringr::str_replace_all() to remove unwanted characters.