
Last chance! 50% off unlimited learning
Sale ends in
fdiff
is a S3 generic to compute (sequences of) suitably lagged / leaded and iterated differences, quasi-differences or (quasi-)log-differences. The difference and log-difference operators D
and Dlog
also exists as parsimonious wrappers around fdiff
, providing more flexibility than fdiff
when applied to data frames.
fdiff(x, n = 1, diff = 1, …)
D(x, n = 1, diff = 1, …)
Dlog(x, n = 1, diff = 1, …)# S3 method for default
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = TRUE, …)
# S3 method for default
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1,
stubs = TRUE, …)
# S3 method for default
Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, …)
# S3 method for matrix
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, …)
# S3 method for matrix
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1,
stubs = TRUE, …)
# S3 method for matrix
Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, …)
# S3 method for data.frame
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, …)
# S3 method for data.frame
D(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, …)
# S3 method for data.frame
Dlog(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, …)
# Methods for indexed data / compatibility with plm:
# S3 method for pseries
fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, shift = "time", …)
# S3 method for pseries
D(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, shift = "time", …)
# S3 method for pseries
Dlog(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, shift = "time", …)
# S3 method for pdata.frame
fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, shift = "time", …)
# S3 method for pdata.frame
D(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE,
shift = "time", keep.ids = TRUE, …)
# S3 method for pdata.frame
Dlog(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE,
shift = "time", keep.ids = TRUE, …)
# Methods for grouped data frame / compatibility with dplyr:
# S3 method for grouped_df
fdiff(x, n = 1, diff = 1, t = NULL, fill = NA, log = FALSE, rho = 1,
stubs = length(n) + length(diff) > 2L, keep.ids = TRUE, …)
# S3 method for grouped_df
D(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE,
keep.ids = TRUE, …)
# S3 method for grouped_df
Dlog(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE,
keep.ids = TRUE, …)
a numeric vector / time series, (time series) matrix, data frame, 'indexed_series' ('pseries'), 'indexed_frame' ('pdata.frame') or grouped data frame ('grouped_df').
integer. A vector indicating the number of lags or leads.
integer. A vector of integers > 1 indicating the order of differencing / log-differencing.
data.frame method: Same as g
, but also allows one- or two-sided formulas i.e. ~ group1
or var1 + var2 ~ group1 + group2
. See Examples.
a time vector or list of vectors. See flag
.
data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: cols
is ignored if a two-sided formula is passed to by
.
value to insert when vectors are shifted. Default is NA
.
logical. TRUE
computes log-differences. See Details.
double. Autocorrelation parameter. Set to a value between 0 and 1 for quasi-differencing. Any numeric value can be supplied.
logical. TRUE
will rename all differenced columns by adding prefixes "Ln
Ddiff
." / "Fn
Ddiff
." for differences "Ln
Dlogdiff
." / "Fn
Dlogdiff
." for log-differences and replacing "D" / "Dlog" with "QD" / "QDlog" for quasi-differences.
pseries / pdata.frame methods: character. "time"
or "row"
. See flag
for details.
data.frame / pdata.frame / grouped_df methods: Logical. Drop all identifiers from the output (which includes all variables passed to by
or t
). Note: For 'grouped_df' / 'pdata.frame' identifiers are dropped, but the "groups"
/ "index"
attributes are kept.
arguments to be passed to or from other methods.
x
differenced diff
times using lags n
of itself. Quasi and log-differences are toggled by the rho
and log
arguments or the Dlog
operator. Computations can be grouped by g/by
and/or ordered by t
. See Details and Examples.
By default, fdiff/D/Dlog
return x
with all columns differenced / log-differenced. Differences are computed as repeat(diff) x[i] - rho*x[i-n]
, and log-differences as log(x[i]) - rho*log(x[i-n])
for diff = 1
and repeat(diff-1) x[i] - rho*x[i-n]
is used to compute subsequent differences (usually diff = 1
for log-differencing). If rho < 1
, this becomes quasi- (or partial) differencing, which is a technique suggested by Cochrane and Orcutt (1949) to deal with serial correlation in regression models, where rho
is typically estimated by running a regression of the model residuals on the lagged residuals.
It is also possible to compute forward differences by passing negative n
values. n
also supports arbitrary vectors of integers (lags), and diff
supports positive sequences of integers (differences):
If more than one value is passed to n
and/or diff
, the data is expanded-wide as follows: If x
is an atomic vector or time series, a (time series) matrix is returned with columns ordered first by lag, then by difference. If x
is a matrix or data frame, each column is expanded in like manor such that the output has ncol(x)*length(n)*length(diff)
columns ordered first by column name, then by lag, then by difference.
For further computational details and efficiency considerations see the help page of flag
.
Cochrane, D.; Orcutt, G. H. (1949). Application of Least Squares Regression to Relationships Containing Auto-Correlated Error Terms. Journal of the American Statistical Association. 44 (245): 32-61.
Prais, S. J. & Winsten, C. B. (1954). Trend Estimators and Serial Correlation. Cowles Commission Discussion Paper No. 383. Chicago.
flag/L/F
, fgrowth/G
, Time Series and Panel Series, Collapse Overview
# NOT RUN {
## Simple Time Series: AirPassengers
D(AirPassengers) # 1st difference, same as fdiff(AirPassengers)
D(AirPassengers, -1) # Forward difference
Dlog(AirPassengers) # Log-difference
D(AirPassengers, 1, 2) # Second difference
Dlog(AirPassengers, 1, 2) # Second log-difference
D(AirPassengers, 12) # Seasonal difference (data is monthly)
D(AirPassengers, # Quasi-difference, see a better example below
rho = pwcor(AirPassengers, L(AirPassengers)))
head(D(AirPassengers, -2:2, 1:3)) # Sequence of leaded/lagged and iterated differences
# let's do some visual analysis
plot(AirPassengers) # Plot the series - seasonal pattern is evident
plot(stl(AirPassengers, "periodic")) # Seasonal decomposition
plot(D(AirPassengers,c(1,12),1:2)) # Plotting ordinary and seasonal first and second differences
plot(stl(window(D(AirPassengers,12), # Taking seasonal differences removes most seasonal variation
1950), "periodic"))
## Time Series Matrix of 4 EU Stock Market Indicators, recorded 260 days per year
plot(D(EuStockMarkets, c(0, 260))) # Plot series and annual differnces
mod <- lm(DAX ~., L(EuStockMarkets, c(0, 260))) # Regressing the DAX on its annual lag
summary(mod) # and the levels and annual lags others
r <- residuals(mod) # Obtain residuals
pwcor(r, L(r)) # Residual Autocorrelation
fFtest(r, L(r)) # F-test of residual autocorrelation
# (better use lmtest :: bgtest)
modCO <- lm(QD1.DAX ~., D(L(EuStockMarkets, c(0, 260)), # Cochrane-Orcutt (1949) estimation
rho = pwcor(r, L(r))))
summary(modCO)
rCO <- residuals(modCO)
fFtest(rCO, L(rCO)) # No more autocorrelation
## World Development Panel Data
head(fdiff(num_vars(wlddev), 1, 1, # Computes differences of numeric variables
wlddev$country, wlddev$year)) # fdiff requires external inputs..
head(D(wlddev, 1, 1, ~country, ~year)) # Differences of numeric variables
head(D(wlddev, 1, 1, ~country)) # Without t: Works because data is ordered
head(D(wlddev, 1, 1, PCGDP + LIFEEX ~ country, ~year)) # Difference of GDP & Life Expectancy
head(D(wlddev, 0:1, 1, ~ country, ~year, cols = 9:10)) # Same, also retaining original series
head(D(wlddev, 0:1, 1, ~ country, ~year, 9:10, # Dropping id columns
keep.ids = FALSE))
## Indexed computations:
wldi <- findex_by(wlddev, iso3c, year)
# Dynamic Panel Data Models:
summary(lm(D(PCGDP) ~ L(PCGDP) + D(LIFEEX), data = wldi)) # Simple case
summary(lm(Dlog(PCGDP) ~ L(log(PCGDP)) + Dlog(LIFEEX), data = wldi)) # In log-differneces
# Adding a lagged difference...
summary(lm(D(PCGDP) ~ L(D(PCGDP, 0:1)) + L(D(LIFEEX), 0:1), data = wldi))
summary(lm(Dlog(PCGDP) ~ L(Dlog(PCGDP, 0:1)) + L(Dlog(LIFEEX), 0:1), data = wldi))
# Same thing:
summary(lm(D1.PCGDP ~., data = L(D(wldi,0:1,1,9:10),0:1,keep.ids = FALSE)[,-1]))
# }
# NOT RUN {
<!-- % No code relying on suggested package -->
## Grouped data
library(magrittr)
wlddev %>% fgroup_by(country) %>%
fselect(PCGDP,LIFEEX) %>% fdiff(0:1,1:2) # Adding a first and second difference
wlddev %>% fgroup_by(country) %>%
fselect(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year) # Also using t (safer)
wlddev %>% fgroup_by(country) %>% # Dropping id's
fselect(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year, keep.ids = FALSE)
# }
Run the code above in your browser using DataLab