fdiff
and fgrowth
are S3 generics to compute (sequences of) suitably lagged / leaded and iterated differences and growth rates / log-differences, respectively. D
and G
are wrappers around fdiff
and fgrowth
representing the 'difference-operator' and the 'growth-operator'. D
/ G
provide more flexibility than fdiff
/ fgrowth
when applied to data frames, but are otherwise identical.
(fdiff
and fgrowth
are programmers functions in style of the Fast Statistical Functions while D
& G
are more practical to use in regression formulas or for computations on data frames.)
fdiff(x, n = 1, diff = 1, …)
fgrowth(x, n = 1, diff = 1, …)
D(x, n = 1, diff = 1, …)
G(x, n = 1, diff = 1, …)
# S3 method for default
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
stubs = TRUE, …)
# S3 method for default
fgrowth(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
logdiff = FALSE, stubs = TRUE, …)
# S3 method for default
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
stubs = TRUE, …)
# S3 method for default
G(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
logdiff = FALSE, stubs = TRUE, …)
# S3 method for matrix
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
stubs = TRUE, …)
# S3 method for matrix
fgrowth(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
logdiff = FALSE, stubs = TRUE, …)
# S3 method for matrix
D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
stubs = TRUE, …)
# S3 method for matrix
G(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
logdiff = FALSE, stubs = TRUE, …)
# S3 method for data.frame
fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
stubs = TRUE, …)
# S3 method for data.frame
fgrowth(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA,
logdiff = FALSE, stubs = TRUE, …)
# S3 method for data.frame
D(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, stubs = TRUE, keep.ids = TRUE, …)
# S3 method for data.frame
G(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, logdiff = FALSE, stubs = TRUE, keep.ids = TRUE, …)# Methods for compatibility with plm:
# S3 method for pseries
fdiff(x, n = 1, diff = 1, fill = NA, stubs = TRUE, …)
# S3 method for pseries
fgrowth(x, n = 1, diff = 1, fill = NA, logdiff = FALSE, stubs = TRUE, …)
# S3 method for pseries
D(x, n = 1, diff = 1, fill = NA, stubs = TRUE, …)
# S3 method for pseries
G(x, n = 1, diff = 1, fill = NA, logdiff = FALSE, stubs = TRUE, …)
# S3 method for pdata.frame
fdiff(x, n = 1, diff = 1, fill = NA, stubs = TRUE, …)
# S3 method for pdata.frame
fgrowth(x, n = 1, diff = 1, fill = NA, logdiff = FALSE, stubs = TRUE, …)
# S3 method for pdata.frame
D(x, n = 1, diff = 1, cols = is.numeric, fill = NA, stubs = TRUE,
keep.ids = TRUE, …)
# S3 method for pdata.frame
G(x, n = 1, diff = 1, cols = is.numeric, fill = NA,
logdiff = FALSE, stubs = TRUE, keep.ids = TRUE, …)
# Methods for compatibility with dplyr:
# S3 method for grouped_df
fdiff(x, n = 1, diff = 1, t = NULL, fill = NA, stubs = TRUE,
keep.ids = TRUE, …)
# S3 method for grouped_df
fgrowth(x, n = 1, diff = 1, t = NULL, fill = NA, logdiff = FALSE,
stubs = TRUE, keep.ids = TRUE, …)
# S3 method for grouped_df
D(x, n = 1, diff = 1, t = NULL, fill = NA, stubs = TRUE,
keep.ids = TRUE, …)
# S3 method for grouped_df
G(x, n = 1, diff = 1, t = NULL, fill = NA, logdiff = FALSE,
stubs = TRUE, keep.ids = TRUE, …)
a numeric vector, matrix, data.frame, panel-series (plm::pseries
), panel-data.frame (plm::pdata.frame
) or grouped tibble (dplyr::grouped_df
).
a integer vector indicating the number of lags or leads.
a vector of integers > 1 indicating the order of differencing / taking growth rates or log-differences.
data.frame method: Same as g
, but also allows one- or two-sided formulas i.e. ~ group1
or var1 + var2 ~ group1 + group2
. See Examples.
same input as g
, to indicate the time-variable. For safe computation of differences/growth rates on unordered time-series and panels. Notes: data.frame method also allows name, index or one-sided formula i.e. ~time
. grouped_df method also allows lazy-evaluation i.e. time
(no quotes).
data.frame method: Select columns to difference/compute growth rates using a function, column names or indices. Default: All numeric variables. Note: cols
is ignored if a two-sided formula is passed to by
.
value to insert when vectors are shifted. Default is NA
.
logical. compute log-differences instead of exact growth rates. See Details.
logical. TRUE
will rename all differenced columns by adding a prefix "Ln
Ddiff
." / "Fn
Ddiff
." and a prefix "Ln
Gdiff
." / "Fn
Gdiff
." for growth rates.
data.frame / pdata.frame / grouped_df methods: Logical. Drop all panel-identifiers from the output (which includes all variables passed to by
or t
). Note: For panel-data.frame's and grouped tibbles identifiers are dropped, but the 'index' / 'groups' attributes are kept.
arguments to be passed to or from other methods.
fdiff/D
returns x
differenced diff
times using lags n
of itself.
fgrowth/G
returns x
where the growth rate or log-difference was taken diff
times using lags n
of itself. Computations can be grouped by g/by
and/or ordered by t
. See Details and Examples.
By default, fdiff/D|fgrowth/G
return x
with all columns differenced | converted to growth rates. Differences are computed as repeat(diff){x[i] - x[i-n]}
, growth rates as repeat(diff){(x[i] - x[i-n])/x[i-n]*100}
and log-differences as repeat(diff){(log(x[i]) - log(x[i-n]))*100}
. Setting diff = 2
thus returns differences of differences | growth rates of growth rates etc... and setting n = 2
returns simple differences computed by subtracting twice-lagged x
from x
. It is also possible to compute forward differences | growth rates by passing negative n
values. n
also supports sequences of integers (lags), and diff
supports positive sequences of integers (differences):
If more than one value is passed to n
and/or diff
, the data is expanded-wide as follows: If x
is an atomic vector or time-series, a (time-series) matrix is returned with columns ordered first by lag, then by difference. If x
is a matrix or data.frame, each column is expanded in like manor such that the output has ncol(x)*length(n)*length(diff)
columns ordered first by column name, then by lag, then by difference.
With groups/panel-identifiers supplied to g/by
, fdiff/D|fgrowth/G
efficiently compute panel-differences | growth rates by inserting fill
elements in the right places. If t
is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves occur in the right order. If time-variable(s) are supplied to t
, the panel is fully identified and differences | growth rates can be securely computed even if the data is completely unordered (in that case data is shifted around and fill
values are inserted in such a way that if the data were sorted afterwards the result would be identical to computing on sorted data). Internally this works by using the grouping- and time-variables to create an ordering and then accessing the panel-vector(s) through this ordering. If the data is just a bit unordered, such computations are nearly as fast as computations on ordered data (without t
), however, if the data is very unordered, it can take significantly longer. Since most panel-data come perfectly or pretty ordered, I recommend always supplying t
to be on the safe-side.
It is also possible to compute differences | growth rates on unordered vectors / time-series (thus utilizing t
but leaving g/by
empty).
The methods applying to plm
objects (panel-series and panel-data.frames) automatically utilize the panel-identifiers attached to these objects and thus securely compute fully identified panel-differences. If these objects have > 2 panel-identifiers attached to them, the last identifier is assumed to be the time-variable, and the others are taken as grouping-variables and interacted.
# NOT RUN {
## Simple Time-Series: Airpassengers
D(AirPassengers) # 1st difference, same as fdiff(AirPassengers)
D(AirPassengers,-1) # forward difference
G(AirPassengers) # growth rate, same as fgrowth(AirPassengers)
G(AirPassengers, logdiff = TRUE) # log-difference
D(AirPassengers,1,2) # second difference
G(AirPassengers,1,2) # growth rate of growth rate
D(AirPassengers,12) # seasonal difference (data is monthly)
G(AirPassengers,12) # seasonal growth rate (data is monthly)
D(AirPassengers,-2:2,1:3) # sequence of leaded/lagged and iterated differences
# let's do some visual analysis
plot(AirPassengers) # plot the series - seasonal pattern is evident
plot(stl(AirPassengers, "periodic")) # Seasonal decomposition
plot(D(AirPassengers,c(1,12),1:2)) # plotting ordinary and seasonal first and second differences
plot(G(AirPassengers,c(1,12),1:2)) # same using growth rates
plot(stl(window(G(AirPassengers,12), # Taking seasonal growth rate removes most seasonal variation
1950), "periodic"))
## Time-Series Matrix of 4 EU Stock Market Indicators, recorded 260 days per year
plot(G(EuStockMarkets,c(0,260))) # Plot series and annual growth rates
summary(lm(L260G1.DAX ~., G(EuStockMarkets,260))) # Annual growth rate of DAX regressed on the
# growth rates of the other indicators
## World Development Panel Data
head(fgrowth(num_vars(wlddev), 1, 1, # Computes growth rates of numeric variables
wlddev$country, wlddev$year)) # fgrowth/fdiff require externally inputs...
head(G(wlddev, 1, 1, ~country, ~year)) # Growth of numeric variables, id's attached
head(G(wlddev, 1, 1, ~country)) # Without t: Works because data is ordered
head(G(wlddev, 1, 1, PCGDP + LIFEEX ~ country, ~year)) # Growth of GDP per Capita & Life Expectancy
head(G(wlddev, 0:1, 1, ~ country, ~year, cols = 9:10)) # Same, also retaining original series
head(G(wlddev, 0:1, 1, ~ country, ~year, 9:10, # Dropping id columns
keep.ids = FALSE))
# Dynamic Panel-Data Models:
summary(lm(G(PCGDP,1,1,iso3c,year) ~ # GDP growth regressed on it's lagged level
L(PCGDP,1,iso3c,year) + # and the growth rate of Life Expanctancy
G(LIFEEX,1,1,iso3c,year), data = wlddev))
g = qF(wlddev$country) # Omitting t and precomputing g allows for a
summary(lm(G(PCGDP,1,1,g) ~ L(PCGDP,1,g) + # bit more parsimonious specification
G(LIFEEX,1,1,g), wlddev))
summary(lm(G1.PCGDP ~., # Now adding level and lagged level of
L(G(wlddev,0:1,1, ~ country, ~year,9:10),0:1, # LIFEEX and lagged growth rates
~ country, ~year, keep.ids = FALSE)[-1]))
## Using plm can make things easier, but avoid attaching or 'with' calls:
pwlddev <- plm::pdata.frame(wlddev, index = c("country","year"))
head(G(pwlddev, 0:1, 1, 9:10)) # Again growth rates of LIFEEX and PCGDP
PCGDP <- pwlddev$PCGDP # A panel-Series of GDP per Capita
D(PCGDP) # Differencing the panel series.
summary(lm(G1.PCGDP ~., # Running the dynamic model again ->
data = L(G(pwlddev,0:1,1,9:10),0:1, # code becomes a bit simpler
keep.ids = FALSE)[-1]))
# One could be tempted to also do something like this, but THIS DOES NOT WORK!!!:
# lm drops the attributes (-> with(pwlddev, PCGDP) drops attr. so G.default and L.matrix are used)
summary(lm(G(PCGDP) ~ L(G(PCGDP,0:1)) + L(G(LIFEEX,0:1),0:1), pwlddev))
# To make it work, one needs to create pseries (note: attach(pwlddev) also won't work)
LIFEEX <- pwlddev$LIFEEX
summary(lm(G(PCGDP) ~ L(G(PCGDP,0:1)) + L(G(LIFEEX,0:1),0:1))) # THIS WORKS !!
## Using dplyr:
library(dplyr)
wlddev %>% group_by(country) %>%
select(PCGDP,LIFEEX) %>% D(0:1,1:2) # Adding a first and second difference
wlddev %>% group_by(country) %>%
select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year) # Also using t (safer)
wlddev %>% group_by(country) %>% # Growth rates, dropping id's
select(year,PCGDP,LIFEEX) %>% G(0:1,1:2,year, keep.ids = FALSE)
# }
Run the code above in your browser using DataLab