na_locf: Missing Value Imputation by Last Observation Carried Forward

Description

Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward- LOCF). Optionally this can also be done starting from the back of the series (Next Observation Carried Backward - NOCB).

Usage

na_locf(x, option = "locf", na_remaining = "rev", maxgap = Inf)

Value

Vector (vector) or Time Series (ts) object (dependent on given input at parameter x)

Arguments

x

Numeric Vector (vector) or Time Series (ts) object in which missing values shall be replaced

option

Algorithm to be used. Accepts the following input:

"locf" - for Last Observation Carried Forward (default choice)
"nocb" - for Next Observation Carried Backward

na_remaining

Method to be used for remaining NAs.

"rev" - to perform nocb / locf from the reverse direction (default choice)
"keep" - to return the series with NAs
"rm" - to remove remaining NAs
"mean" - to replace remaining NAs by overall mean

maxgap

Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately.

Author

Steffen Moritz

Details

General Functionality

Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward - LOCF). This can also be done in reverse direction, starting from the end of the series (then called Next Observation Carried Backward - NOCB).

Handling for NAs at the beginning of the series

In case one or more successive observations directly at the start of the time series are NA, there exists no 'last value' yet, that can be carried forward. Thus, no LOCF imputation can be performed for these NAs. As soon as the first non-NA value appears, LOCF can be performed as expected. The same applies to NOCB, but from the opposite direction.

While this problem might appear seldom and will only affect a very small amount of values at the beginning, it is something to consider. The na_remaining parameter helps to define, what should happen with these values at the start, that would remain NA after pure LOCF.

Default setting is na_remaining = "rev", which performs nocb / locf from the other direction to fill these NAs. So a NA at the beginning will be filled with the next non-NA value appearing in the series.

With na_remaining = "keep" NAs at the beginning (that can not be imputed with pure LOCF) are just left as remaining NAs.

With na_remaining = "rm" NAs at the beginning of the series are completely removed. Thus, the time series is basically shortened.

Also available is na_remaining = "mean", which uses the overall mean of the time series to replace these remaining NAs. (but beware, mean is usually not a good imputation choice - even if it only affects the values at the beginning)

Examples

Run this code

# Prerequisite: Create Time series with missing values
x <- ts(c(NA, 3, 4, 5, 6, NA, 7, 8))

# Example 1: Perform LOCF
na_locf(x)

# Example 2: Perform NOCF
na_locf(x, option = "nocb")

# Example 3: Perform LOCF and remove remaining NAs
na_locf(x, na_remaining = "rm")

# Example 4: Same as example 1, just written with pipe operator
x %>% na_locf()

Run the code above in your browser using DataLab