lagTimeSeries: Create lagged versions of time series variables

Description

Takes a multivariate time series and creates time-lagged columns for modeling. This generates one new column per lag and variable, enabling analysis of how past values influence current observations.

Usage

lagTimeSeries(
  input.data = NULL,
  response = NULL,
  drivers = NULL,
  time = NULL,
  oldest.sample = "first",
  lags = NULL,
  time.zoom = NULL,
  scale = FALSE
)
prepareLaggedData(
  input.data = NULL,
  response = NULL,
  drivers = NULL,
  time = NULL,
  oldest.sample = "first",
  lags = NULL,
  time.zoom = NULL,
  scale = FALSE
)

Value

A dataframe with columns representing time-delayed values of the drivers and the response. Column names have the lag number as a suffix. Has the attributes `response` and `drivers`, later used by [computeMemory()].

Arguments

input.data: a dataframe with one time series per column. Default: NULL.
response: character string, name of the numeric column to be used as response in the model. Default: NULL.
drivers: character vector, names of the numeric columns to be used as predictors in the model. Default: NULL.
time: character vector, name of the numeric column with the time. Default: NULL.
oldest.sample: character string, either "first" or "last". When "first", the first row taken as the oldest case of the time series and the last row is taken as the newest case, so ecological memory flows from the first to the last row of input.data. When "last", the last row is taken as the oldest sample, and this is the mode that should be used when input.data represents a palaeoecological dataset. Default: "first".
lags: numeric vector, lags to be used in the equation, in the same units as time. The use of seq to define it is highly recommended. If 0 is absent from lags, it is added automatically to allow the consideration of a concurrent effect. Lags should be aligned to the temporal resolution of the data. For example, if the interval between consecutive samples is 100 years, lags should be something like 0, 100, 200, 300. Lags can also be multiples of the time resolution, such as 0, 200, 400, 600 (when time resolution is 100 years). Default: NULL.
time.zoom: numeric vector of two values from the range of the time column, used to subset the data if desired. Default: NULL.
scale: boolean, if TRUE, applies the scale function to normalize the data. Required if the lagged data is going to be used to fit linear models. Default: FALSE.

Author

Blas M. Benito <blasbenito@gmail.com>

Details

The function interprets the time column as an index representing the temporal position of each sample. It uses the lag function from the zoo package to shift columns by the specified lags, generating one new column per lag and variable.

Examples

Run this code

#loading data
data(palaeodata)

#adding lags
lagged.data <- lagTimeSeries(
 input.data = palaeodata,
 response = "pollen.pinus",
 drivers = c("climate.temperatureAverage", "climate.rainfallAverage"),
 time = "age",
 oldest.sample = "last",
 lags = seq(0.2, 1, by=0.2)
)

str(lagged.data)

# Check attributes (used by computeMemory)
attributes(lagged.data)

Run the code above in your browser using DataLab