This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in either the *dmy* format (day-month-year) **or** the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.
impute_date(
data_frame,
column_name,
date_format = "ymd",
separator = "-",
year = "UNKN",
month = "UNK",
day = "UN",
min_max = "min",
suffix = "_DT"
)A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.
data frame
name of column that keeps dates to be imputed
by default "ymd". choose between ymd (if first year, then month then day) and dmy (if first day, then month then year)
by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator
by default "UNKN" - the format of unknown year
by default "UNK" - the format of unknown month
by default "UN" - the format of unknown day
by default "min". controlling imputation direction."min" - Impute the earliest possible date "max"` - Impute the latest possible date
by default "_DT" - new imputed date is named as source variable with suffix
Lukasz Andrzejewski
If the **year** is missing or explicitly marked as unknown (e.g., `"UNKN"`), the function returns `NA`. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.
Any datetime strings (e.g., `"NA-01-2025T11:10:00"`) must be preprocessed to remove the time component before applying this function (e.g., convert to `"NA-01-2025"`).
In addition to imputing the date, the function creates an accompanying **flag variable** named as: `"<source_variable>_<suffix>F"`. This flag variable indicates the type of imputation performed:
`NA` — No imputation was performed (the original date was complete).
`"D"` — The **day** component was imputed.
`"M"` — The **month** component were imputed.
`"D, M"` — Both **month** and **day** components were imputed.
impute_date(data_frame = data.frame(K = c('2025 11 UN', '2025 UNK 23')),
column_name = "K", separator = " ")
Run the code above in your browser using DataLab