Learn R Programming

datetoiso (version 1.2.1)

impute_date_ymd: Impute Missing Components in Partial Date Strings

Description

This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.

Usage

impute_date_ymd(
  data_frame,
  column_name,
  separator = "-",
  year = "UNKN",
  month = "UNK",
  day = "UN",
  min_max = "min",
  suffix = "_DT"
)

Value

A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

min_max

by default "min". controlling imputation direction."min" - Impute the earliest possible date "max"` - Impute the latest possible date

suffix

by default "_DT" - new imputed date is named as source variable with suffix

Author

Lukasz Andrzejewski

Details

If the **year** is missing or explicitly marked as unknown (e.g., `"UNKN"`), the function returns `NA`. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.

Any datetime strings (e.g., `"2025-01-NAT11:10:00"`) must be preprocessed to remove the time component before applying this function (e.g., convert to `"2025-01-NA"`).

In addition to imputing the date, the function creates an accompanying **flag variable** named as: `"<source_variable>_<suffix>F"`. This flag variable indicates the type of imputation performed:

  • `NA` — No imputation was performed (the original date was complete or missing year).

  • `"D"` — The **day** component was imputed. The **month** component was imputed.

  • `"M"` — The **month** component were imputed.

  • `"D, M"` — Both **month** and **day** components were imputed.