Learn R Programming

datetoiso (version 1.2.1)

clean_date: Prepare and normalize date-like strings before YMD conversion

Description

This function applies a series of cleaning and normalization steps to strings representing dates. It is intended for use before parsing dates into a YMD (year–month–day) format. The function standardizes month names, trims whitespace, removes invalid characters, and handles strings that contain a letter "T" (common in timestamp formats).

Usage

clean_date(df_column)

Value

A character vector of cleaned date strings, with a maximum length of 12 characters, trimmed of whitespace, and with any timestamp-like "T" components removed when appropriate.

Arguments

df_column

A character vector or data frame column containing raw date-like strings to be cleaned.

Author

Lukasz Andrzejewski

Details

The processing includes:

  • Converting full month names to abbreviated forms (via get_abbreviated_month_name()).

  • Limiting the string to the first 12 characters (via get_up_to_12_char()).

  • Removing non-date characters (via remove_no_date_characters()).

  • Trimming whitespace at the start and end of the string.

  • Handling timestamps or strings containing the letter "T":

    • If "T" appears exactly once and the string does not contain "August" or "October", keep only the substring before "T".

    • If "T" appears multiple times, remove the unnecessary trailing part using remove_unnecessary_part_of_date().

  • If the first token of the string (separated by a space) is longer than four characters, return only that first token.

Examples

Run this code
clean_date(c("2024-01-10T15:30:00", "2024 AUGUST 12", "20250101"))

Run the code above in your browser using DataLab