recmt: Restricted ECM test

Description

This function is used to perform the Error Correction Model (ECM) test, which is designed to determine whether there is cointegration in the model. Cointegration indicates a long-term equilibrium relationship between variables, despite short-term deviations. The ECM test helps identify if such a long-term relationship exists by examining the short-run dynamics and adjusting for deviations from equilibrium. If the test confirms cointegration, it suggests that the variables move together over time, maintaining a stable long-term relationship. This is critical for ensuring that the model properly captures both short-term fluctuations and long-term equilibrium behavior.

Usage

recmt(
  data = NULL,
  model = NULL,
  case = 3,
  signif_level = "auto",
  maxlag = NULL,
  mode = NULL,
  criterion = NULL,
  differentAsymLag = NULL,
  batch = NULL,
  ...
)

Value

A list containing the results of the PSS t Bound test and recm, including:

type: The type of test performed, which is "cointegration".
case: The case number used in the test (1, 2, 3, 4, or 5).
statistic: The t-statistic value calculated from the test.
k: The number of long-run variables in the model.
Cont: The conclusion of the test, indicating whether cointegration is present, inconclusive, or absent.
BoundNum: A numeric representation of the conclusion, where 1 indicates cointegration, 0 indicates inconclusive, and -1 indicates no cointegration.
siglvl: The significance level used in the test, either "auto" or one of the specified numeric levels.
criticalValues: A vector of critical values for the test, corresponding to the significance levels.
parameter: The names of the long-run variables in the model.
coef: The estimated coefficient of the error correction term.
FH0: The null hypothesis of the test, which includes the long-run adjustment coefficient set to zero.
longrunEQ: The long-run equation used in the test.
shortrunEQ: The short-run equation used in the test.
ecmL: The linear model fitted to the long-run equation.
ecmS: The short-run model fitted to the error correction term.
ecmResiduals: The residuals from the long-run model.
EcmResLagged: The lagged residuals from the long-run model.
finalModel: The final model used in the test, which includes the error correction model.
OptLag: The optimal lag length determined for the model.
warnings: Any warnings generated during the test, such as sample size concerns.
method: The method used for the test, which is "recmt".

Arguments

data

The data of analysis

model

A formula specifying the long-run model equation. This formula defines the relationships between the dependent variable and explanatory variables, including options for deterministic terms, asymmetric variables, and a trend component.

Example formula: y ~ x + z + asym(z) + asymL(x2 + x3) + asymS(x3 + x4) + deterministic(dummy1 + dummy2) + trend

Details

The formula allows flexible specification of variables and their roles in the model: - Deterministic variables (e.g., dummies) can be included using deterministic(). Multiple deterministic variables can be added with + (e.g., deterministic(dummy1 + dummy2)). These variables are considered fixed and are not associated with short-run or long-run dynamics. - Asymmetric variables can be included for both short-run and long-run dynamics:

asymS: Specifies short-run asymmetric variables. For example, asymS(x1 + x2) includes variables x1 and x2 for short-run asymmetry.
asymL: Specifies long-run asymmetric variables. For example, asymL(x1 + x3) includes variables x1 and x3 for long-run asymmetry.
asym: Includes variables for both short-run and long-run asymmetry. For example, asym(x1 + x3) applies asymmetric decomposition for both dynamics.

A trend term can be added to the model to account for deterministic linear trends over time. Simply include trend in the formula.

These components can be combined flexibly in the formula to define a robust model tailored to your analysis.

case

Numeric or character. Specifies the case of the test to be used in the function. Acceptable values are 1, 2, 3, 4, 5, and "auto". If "auto" is chosen, the function determines the case automatically based on the model's characteristics. Invalid values will result in an error.

1: No intercept and no trend
2: Restricted intercept and no trend
3: Unrestricted intercept and no trend
4: Unrestricted intercept and restricted trend
5: Unrestricted intercept and unrestricted trend

signif_level

Character or numeric. Specifies the significance level to be used in the function. Acceptable values are "auto", "0.10", "0.1", "0.05", "0.025", and "0.01". If a numeric value is provided, it will be converted to a character string. If "auto" is chosen, the function determines the significance level automatically. Invalid values will result in an error.

maxlag

An integer specifying the maximum number of lags to be considered for the model. The default value is 4. This parameter sets an upper limit on the lag length during the model estimation process.

details

The maxlag parameter is crucial for defining the maximum lag length that the model will evaluate when selecting the optimal lag structure based on the specified criterion. It controls the computational effort and helps prevent overfitting by restricting the search space for lag selection.

If the data has a short time horizon or is prone to overfitting, consider reducing maxlag. -
If the data is expected to have long-term dependencies, increasing maxlag may be necessary to capture the relevant dynamics.

Setting an appropriate value for maxlag depends on the nature of your dataset and the context of the analysis:

For small datasets or quick tests, use smaller values (e.g., maxlag = 2).
For datasets with more observations or longer-term patterns, larger values (e.g., maxlag = 8) may be appropriate, though this increases computational time.

examples

Using the default maximum lag (4)

kardl(data, MyFormula, maxlag = 4)

Reducing the maximum lag to 2 for faster computation

kardl(data, MyFormula, maxlag = 2)

Increasing the maximum lag to 8 for datasets with longer dependencies

kardl(data, MyFormula, maxlag = 8)

mode

Specifies the mode of estimation and output control. This parameter determines how the function handles lag estimation and what kind of feedback or control is provided during the process. The available options are:

"quick" (default): Displays progress and messages in the console while the function estimates the optimal lag values. This mode is suitable for interactive use or for users who want to monitor the estimation process in real-time. It provides detailed feedback for debugging or observation but may use additional resources due to verbose output.
"grid" : Displays progress and messages in the console while the function estimates the optimal lag values. This mode is suitable for interactive use or for users who want to monitor the estimation process in real-time. It provides detailed feedback for debugging or observation but may use additional resources due to verbose output.
"grid_custom": Suppresses most or all console output, prioritizing faster execution and reduced resource usage on PCs or servers. This mode is recommended for high-performance scenarios, batch processing, or when the estimation process does not require user monitoring. Suitable for large-scale or repeated runs where output is unnecessary.
User-defined vector: A numeric vector of lag values specified by the user, allowing full customization of the lag structure used in model estimation. When a user-defined vector is provided (e.g., `c(1, 2, 4, 5)`), the function skips the lag optimization process and directly uses the specified lags.

- Users can define lag values directly as a numeric vector. For example: mode = c(1, 2, 4, 5) assigns lags of 1, 2, 4, and 5 to variables in the specified order. - Alternatively, lag values can be assigned to variables by name for clarity and control. For example: mode = c(CPI = 2, ER_POS = 3, ER_NEG = 1, PPI = 3) assigns lags to variables explicitly. - Ensure that the lags are correctly designated by verifying the result using kardl_model$properLag after estimation.

Attention! -A function-based criterion or user-defined function can be specified for model selection, but this is only supported for mode = "grid_custom" and mode = "quick". The mode = "grid" option is restricted to predefined criteria (e.g., AIC or BIC). For more information on available criteria, see the modelCriterion function documentation. - When using a numeric vector, ensure the order of lag values matches the variables in your formula. - If using named vectors, double-check the variable names to avoid mismatches or unintended results. - This mode bypasses the automatic lag optimization and assumes the user-defined lags are correct.

The `mode` parameter provides flexibility for different use cases: - Use `"grid"` mode for debugging or interactive use where progress visibility is important. - Use `"grid_custom"` mode to minimize overhead in computationally intensive tasks. - Specify a user-defined vector to customize the lag structure based on prior knowledge or analysis.

Selecting the appropriate mode can improve the efficiency and usability of the function depending on the user's requirements and the computational environment.

criterion

A string specifying the information criterion to be used for selecting the optimal lag structure. The available options are:

"AIC": Akaike Information Criterion (default). This criterion balances model fit and complexity, favoring models that explain the data well with fewer parameters.
"BIC": Bayesian Information Criterion. This criterion imposes a stronger penalty for model complexity than AIC, making it more conservative in selecting models with fewer parameters.
"AICc": Corrected Akaike Information Criterion. This is an adjusted version of AIC that accounts for small sample sizes, making it more suitable when the number of observations is limited relative to the number of parameters.
"HQ": Hannan-Quinn Information Criterion. This criterion provides a compromise between AIC and BIC, favoring models that balance fit and complexity without being overly conservative.

The criterion can be specified as a string (e.g., "AIC") or as a user-defined function that takes a fitted model object. Please visit the modelCriterion function documentation for more details on using custom criteria.

differentAsymLag

A logical value indicating whether to allow different lag lengths for positive and negative decompositions.

batch

A string specifying the batch processing configuration in the format "current_batch/total_batches". If a user utilize grid or grid_custom mode and want to split the lag search into multiple batches, this parameter can be used to define the current batch and the total number of batches. For example, "2/5" indicates that the current batch is the second out of a total of five batches. The default value is "1/1", meaning that the entire lag search is performed in a single batch.

...

Additional arguments that can be passed to the function. These arguments can be used to

Hypothesis testing

The restricted ECM test, also known as the PSS t Bound test, is a statistical test used to assess the presence of cointegration in a model. Cointegration refers to a long-term equilibrium relationship between two or more time series variables. The PSS t Bound test is based on the work of Pesaran, Shin, and Smith (2001) and is particularly useful for models with small sample sizes.

The null and alternative hypotheses for the restricted ECM test are as follows:

$$\mathbf{H_{0}:} \theta = 0$$ $$\mathbf{H_{1}:} \theta \neq 0$$

The null hypothesis ($H_{0}$) states that there is no cointegration in the model, meaning that the long-run relationship between the variables is not significant. The alternative hypothesis ($H_{1}$) suggests that there is cointegration, indicating a significant long-term relationship between the variables.

The test statistic is calculated as the t-statistic of the coefficient of the error correction term ($\theta$) in the ECM model. If the absolute value of the t-statistic exceeds the critical value from the PSS t Bound table, we reject the null hypothesis in favor of the alternative hypothesis, indicating that cointegration is present.

The cases for the restricted ECM Bound test are defined as follows:

case 1: No constant, no trend.

This case is used when the model does not include a constant term or a trend term. It is suitable for models where the variables are stationary and do not exhibit any long-term trends.

The model is specified as follows:

$$ \begin{aligned} \Delta y_t = \sum_{j=1}^{p} \gamma_j \Delta y_{t-j} + \sum_{i=1}^{k} \sum_{j=0}^{q_i} \beta_{ij} \Delta x_{i,t-j} + \theta (y_{t-1} - \sum_{i=1}^{k} \alpha_i x_{i,t-1} ) + e_t \end{aligned} $$
case 2: Restricted constant, no trend.

This case is used when the model includes a constant term but no trend term. It is suitable for models where the variables exhibit a long-term relationship but do not have a trend component. The model is specified as follows: $$ \begin{aligned} \Delta y_t &= \sum_{j=1}^{p} \gamma_j \Delta y_{t-j} + \sum_{i=1}^{k} \sum_{j=0}^{q_i} \beta_{ij} \Delta x_{i,t-j} + \theta (y_{t-1} - \alpha_0 - \sum_{i=1}^{k} \alpha_i x_{i,t-1} ) + e_t \end{aligned} $$
case 3: Unrestricted constant, no trend.

This case is used when the model includes an unrestricted constant term but no trend term. It is suitable for models where the variables exhibit a long-term relationship with a constant but do not have a trend component.

The model is specified as follows:

$$ \begin{aligned} \Delta y_t &= \sum_{j=1}^{p} \gamma_j \Delta y_{t-j} + \sum_{i=1}^{k} \sum_{j=0}^{q_i} \beta_{ij} \Delta x_{i,t-j} + \theta (y_{t-1} - \alpha_0 - \sum_{i=1}^{k} \alpha_i x_{i,t-1} ) + e_t \end{aligned} $$
case 4: Unrestricted Constant, restricted trend.

This case is used when the model includes an unrestricted constant term and a restricted trend term. It is suitable for models where the variables exhibit a long-term relationship with a constant and a trend component.

The model is specified as follows:

$$ \begin{aligned} \Delta y_t &= \phi + \sum_{j=1}^{p} \gamma_j \Delta y_{t-j} + \sum_{i=1}^{k} \sum_{j=0}^{q_i} \beta_{ij} \Delta x_{i,t-j} + \theta (y_{t-1} - \pi (t-1) - \sum_{i=1}^{k} \alpha_i x_{i,t-1} ) + e_t \end{aligned} $$
case 5: Unrestricted constant, unrestricted trend.

The Error Correction Model (ECM) is specified as follows: $$ \begin{aligned} \Delta y_t &= \phi + \varphi t + \sum_{j=1}^{p} \gamma_j \Delta y_{t-j} + \sum_{i=1}^{k} \sum_{j=0}^{q_i} \beta_{ij} \Delta x_{i,t-j} + \theta (y_{t-1} - \sum_{i=1}^{k} \alpha_i x_{i,t-1} ) + e_t \end{aligned} $$

Examples

Run this code


 # Sample article: THE DYNAMICS OF EXCHANGE RATE PASS-THROUGH TO DOMESTIC PRICES IN TURKEY
 library(magrittr)
 kardl_set(model=CPI~ER+PPI+asym(ER)+deterministic(covid)+trend ,
           data=imf_example_data ,
           maxlag=3)

 recmt_model_grid<-recmt(mode = "grid")
 recmt_model_grid
 recmt_model<- imf_example_data %>% recmt(mode = "grid_custom")
 recmt_model
 recmt_model2<-recmt(mode = c( 2    ,  1    ,  1   ,   3 ))
 # Getting the results
 recmt_model2
 # Getting the summary of the results
 summary(recmt_model2)
 # OR
 imf_example_data %>% recmt(CPI~PPI+asym(ER) +trend,case=4) %>% summary()

 # For increasing the performance of finding the most fitted lag vector
 recmt(mode = "grid_custom")
 # Setting max lag instead of default value [4]
 recmt(maxlag = 2, mode = "grid_custom")
 # Using another criterion for finding the best lag
 recmt(criterion = "HQ", mode = "grid_custom")



 # summary( myNewStarSigns)
 # For using different lag values for negative and positive decompositions of non-linear variables

 # setting the same lags for positive and negative decompositions.
 kardl_set(differentAsymLag = FALSE)

 diffAsymLags<-recmt( mode = "grid_custom")
 diffAsymLags$OptLag

 # setting the different lags for positive and negative decompositions
 sameAsymLags<-recmt(differentAsymLag = TRUE , mode = "grid_custom" )
 sameAsymLags$OptLag


 # Setting the preffixes and suffixes for non-linear variables
 kardl_reset()
 kardl_set(AsymPrefix = c("asyP_","asyN_"), AsymSuffix = c("_PP","_NN"))
 customizedNames<-recmt(imf_example_data, CPI~ER+PPI+asym(ER) )
 customizedNames$ecmS$finalModel$model

 # For having the lags plot
 library(ggplot2)
 library(dplyr)

 #  recmt_model_grid[["LagCriteria"]] is a matrix, convert it to a data frame
 LagCriteria <- as.data.frame(recmt_model_grid$ecmS$LagCriteria)
 # Rename columns for easier access and convert relevant columns to numeric
 colnames(LagCriteria) <- c("lag", "AIC", "BIC", "AICc", "HQ")
 LagCriteria <- LagCriteria %>%  mutate(across(c(AIC, BIC, HQ), as.numeric))

 # Pivot the data to a long format excluding AICc
 library(tidyr)

 LagCriteria_long <- LagCriteria %>%  select(-AICc) %>%
 pivot_longer(cols = c(AIC, BIC, HQ), names_to = "Criteria", values_to = "Value")
 # Find the minimum value for each criterion
 min_values <- LagCriteria_long %>%  group_by(Criteria) %>%
   slice_min(order_by = Value) %>%  ungroup()

 # Create the ggplot with lines, highlight minimum values, and add labels
 ggplot(LagCriteria_long, aes(x = lag, y = Value, color = Criteria, group = Criteria)) +
   geom_line() +
   geom_point(data = min_values, aes(x = lag, y = Value), color = "red", size = 3, shape = 8) +
   geom_text(data = min_values, aes(x = lag, y = Value, label = lag),
     vjust = 1.5, color = "black", size = 3.5) +
   labs(title = "Lag Criteria Comparison", x = "Lag Configuration",  y = "Criteria Value") +
   theme_minimal() +
   theme(axis.text.x = element_text(angle = 45, hjust = 1))