Learn R Programming

datetoiso (version 1.2.1)

impute_date: Impute Missing Components in Partial Date Strings

Description

This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in either the *dmy* format (day-month-year) **or** the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.

Usage

impute_date(
  data_frame,
  column_name,
  date_format = "ymd",
  separator = "-",
  year = "UNKN",
  month = "UNK",
  day = "UN",
  min_max = "min",
  suffix = "_DT"
)

Value

A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

date_format

by default "ymd". choose between ymd (if first year, then month then day) and dmy (if first day, then month then year)

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

min_max

by default "min". controlling imputation direction."min" - Impute the earliest possible date "max"` - Impute the latest possible date

suffix

by default "_DT" - new imputed date is named as source variable with suffix

Author

Lukasz Andrzejewski

Details

If the **year** is missing or explicitly marked as unknown (e.g., `"UNKN"`), the function returns `NA`. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.

Any datetime strings (e.g., `"NA-01-2025T11:10:00"`) must be preprocessed to remove the time component before applying this function (e.g., convert to `"NA-01-2025"`).

In addition to imputing the date, the function creates an accompanying **flag variable** named as: `"<source_variable>_<suffix>F"`. This flag variable indicates the type of imputation performed:

  • `NA` — No imputation was performed (the original date was complete).

  • `"D"` — The **day** component was imputed.

  • `"M"` — The **month** component were imputed.

  • `"D, M"` — Both **month** and **day** components were imputed.

Examples

Run this code
impute_date(data_frame = data.frame(K = c('2025 11 UN', '2025 UNK 23')),
column_name = "K", separator = " ")

Run the code above in your browser using DataLab