Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward- LOCF). Optionally this can also be done starting from the back of the series (Next Observation Carried Backward - NOCB).
na_locf(x, option = "locf", na_remaining = "rev", maxgap = Inf)
Vector (vector
) or Time Series (ts
)
object (dependent on given input at parameter x)
Numeric Vector (vector
) or Time Series (ts
)
object in which missing values shall be replaced
Algorithm to be used. Accepts the following input:
"locf" - for Last Observation Carried Forward (default choice)
"nocb" - for Next Observation Carried Backward
Method to be used for remaining NAs.
"rev" - to perform nocb / locf from the reverse direction (default choice)
"keep" - to return the series with NAs
"rm" - to remove remaining NAs
"mean" - to replace remaining NAs by overall mean
Maximum number of successive NAs to still perform imputation on. Default setting is to replace all NAs without restrictions. With this option set, consecutive NAs runs, that are longer than 'maxgap' will be left NA. This option mostly makes sense if you want to treat long runs of NA afterwards separately.
Steffen Moritz
Replaces each missing value with the most recent present value prior to it (Last Observation Carried Forward - LOCF). This can also be done in reverse direction, starting from the end of the series (then called Next Observation Carried Backward - NOCB).
In case one or more successive observations directly at the start of the time series are NA, there exists no 'last value' yet, that can be carried forward. Thus, no LOCF imputation can be performed for these NAs. As soon as the first non-NA value appears, LOCF can be performed as expected. The same applies to NOCB, but from the opposite direction.
While this problem might appear seldom and will only affect a very small
amount of values at the beginning, it is something to consider.
The na_remaining
parameter helps to define, what should happen
with these values at the start, that would remain NA after pure LOCF.
Default setting is na_remaining = "rev"
, which performs
nocb / locf from the other direction to fill these NAs. So a NA
at the beginning will be filled with the next non-NA value appearing
in the series.
With na_remaining = "keep"
NAs at the beginning (that can not
be imputed with pure LOCF) are just left as remaining NAs.
With na_remaining = "rm"
NAs at the beginning of the series are
completely removed. Thus, the time series is basically shortened.
Also available is na_remaining = "mean"
, which uses the overall
mean of the time series to replace these remaining NAs. (but beware,
mean is usually not a good imputation choice - even if it only affects
the values at the beginning)
na_interpolation
,
na_kalman
,
na_ma
, na_mean
,
na_random
, na_replace
,
na_seadec
, na_seasplit
# Prerequisite: Create Time series with missing values
x <- ts(c(NA, 3, 4, 5, 6, NA, 7, 8))
# Example 1: Perform LOCF
na_locf(x)
# Example 2: Perform NOCF
na_locf(x, option = "nocb")
# Example 3: Perform LOCF and remove remaining NAs
na_locf(x, na_remaining = "rm")
# Example 4: Same as example 1, just written with pipe operator
x %>% na_locf()
Run the code above in your browser using DataLab