get_diff: First Difference with Strict Time Indexing

Description

Computes the first difference of a time-indexed series using strict time-based lagging. The function respects gaps in the time index and returns NA when the previous time period does not exist, mirroring Stata’s D. operator.

Usage

get_diff(vec, tvec)

Value

A vector of the same length as vec, containing the first differences aligned by the time index. Elements are NA when the previous time period does not exist.

Arguments

vec: A numeric (or atomic) vector of observations.
tvec: A vector of time indices corresponding one-to-one with vec. Each value must uniquely identify a time period within the series.

Time Indexing Logic

This section explains how get_diff() computes first differences and why strict time indexing matters in the presence of gaps.

Relation to Stata’s D. operator

The function replicates the behaviour of Stata’s first-difference operator D.x. When time periods are missing, Stata returns missing values rather than differencing across gaps. Because get_diff() relies on get_lag, it follows the same rule.

Why not use diff()?

The base R function diff() computes differences based on vector positions. This implicitly assumes a complete and regularly spaced time index. When time periods are missing, diff() can produce misleading results by differencing across gaps. get_diff() avoids this by differencing only when the previous time period exists.

Details

This helper function computes first differences as: $$ \Delta x_t = x_t - x_{t-1}, $$ where the lagged value $x_{t-1}$ is obtained using get_lag, which performs strict time-based lookup.

Internally, the function calls:


val_t_minus_1 <- get_lag(vec, tvec, 1)

and then subtracts this lagged vector from vec. If the time index contains gaps, or if the previous time period does not exist for a given observation, the lagged value is NA and the corresponding difference is also NA.

No interpolation or implicit shifting is performed; missing time periods propagate as missing differences.

Examples

Run this code

## Example 1: Regular time series
t <- 1:5
x <- c(10, 20, 30, 40, 50)

get_diff(x, t)
# [1] NA 10 10 10 10

## Example 2: Time series with a gap
t_gap <- c(1, 2, 4, 5)
x_gap <- c(10, 20, 40, 50)

get_diff(x_gap, t_gap)
# [1] NA 10 NA 10

## Explanation:
## At t = 4, the previous period t-1 = 3 does not exist, so the difference is NA.

## Example 3: Comparison with diff()
diff(x_gap)
# [1] 10 20 10

Run the code above in your browser using DataLab