Learn R Programming

datetoiso (version 1.2.1)

remove_no_date_characters: Remove unnecessary characters from date-like strings

Description

This function cleans a character vector or data frame column containing date-like strings by removing all characters that are not needed for parsing or recognizing dates. It preserves:

  • Digits (0–9)

  • Letters that appear in any full month name (e.g., "January" → "J, A, N, U, R, Y")

  • Selected extra allowed characters: space (" "), dash ("-"), slash ("/"), and "k"/"K"

All other characters (symbols, punctuation, letters not in month names) are removed.

Usage

remove_no_date_characters(df_column)

Value

A character vector of the same length as df_column, with unwanted characters removed. Only digits, letters from month names, and selected extra characters are kept.

Arguments

df_column

A character vector (or data frame column) containing date-like strings. Factors will be coerced to character. NA values are preserved.

Author

Lukasz Andrzejewski

Details

The function works as follows:

  1. Converts input to character vector.

  2. Generates the set of letters present in all English month names (case-insensitive).

  3. Constructs a regex pattern to match all characters that are NOT digits, allowed letters, or allowed extra symbols.

  4. Uses stringr::str_replace_all() to remove unwanted characters.