dlsem: Distributed-lag structural equation modelling

Description

Estimation of a distributed-lag structural equation model with second-order polynomial lag shapes from data.

Usage

dlsem(model.code, group = NULL, exogenous = NULL, data, log = FALSE, control = NULL, uniroot.check = TRUE, imputation = TRUE, test = "adf", combine = "choi", k = 0, lshort = TRUE, maxdiff = 5, tol = 0.0001, maxit = 500, plotDir = NULL)

Arguments

model.code

A list of objects of class formula, each describing a single regression model. See Details.

group

The name of the group factor (optional). If NULL, no groups are considered.

exogenous

The name of exogenous variables (optional). Exogenous variables never appear on the left side of an equation and are not lagged.

data

An object of class data.frame containing the variables included in the model.

log

Logical. If TRUE, logarithmic transformation is applied to quantitative variables. Default is FALSE.

control

A list containing options for estimation. See Details.

uniroot.check

Logical. If TRUE, unit root test is performed for each variable, and appropriate differentation is applied. Default is FALSE.

imputation

Logical. If TRUE, missing data will be imputed using the EM algorithm. Default is FALSE.

test

The unit root test to use, that can be either "adf" or "kpss" (see unirootTest). Ignored if uniroot.check=FALSE. Default is "adf".

combine

The method to combine p-values of different groups, that can be either "choi" or "demetrescu" (see unirootTest). Ignored if uniroot.check=FALSE or group is NULL. Default is "choi".

The lag order to calculate the test statistic. Ignored if test="kpss". Default is 0.

lshort

Logical. If TRUE, the short version of the truncation lag parameter is used. Ignored if test="adf". Default is TRUE.

maxdiff

The maximum differentiation order to apply. Ignored if uniroot.check=FALSE. Default is 5.

maxit

The maximum number of iterations for the EM algorithm (see EM.imputation). Ignored if imputation=FALSE. Default is 500.

tol

The tolerance threshold of the EM algorithm (see EM.imputation). Ignored if imputation=FALSE. Default is 0.0001.

plotDir

A directory where to save the plots of the lag shapes (optional). If NULL, no plots will be produced.

Value

dlsem, with the following components:S3 methods available for class dlsem are:

Details

Formulas cannot contain interaction terms (no ':' or '*' symbols), and may contain the following operators for lag specification:

- quec: quadratic (2nd order polynomial) lag shape with endpoint constraints;

- qdec: quadratic (2nd order polynomial) decreasing lag shape.

Each operator must have the following three arguments (provided within brackets):

1) the name of the covariate to which the lag is applied;

2) the minimum lag with a non-zero coefficient;

3) the maximum lag with a non-zero coefficient.

For example, quec(X1,3,15) indicates that a quadratic lag shape with endpoint constraints must be applied to variable X1 in the interval (3,15). The formula of regression models with no covariates excepting exogenous variables can be omitted from argument model.code. Variables appearing in any formula are treated as quantitative. The group factor and exogenous variables must appear in no formulas.

Argument control must be a named list containing one or more among the following components:

- L: a named vector of non-negative integer values including the highest lag with non-zero autocorrelation for one or more response variables. If greater than 0, the Newey-West correction of the covariance matrix of estimates (Newey and West, 1987) is used. Default is 0 for all response variables.

- adapt: a named vector of logical values indicating if adaptation of lag shapes must be performed for one or more response variables. Default is FALSE for all response variables.

- max.gestation: a named list. Each component of the list must refer to one response variable and contain a named vector, including the maximum gestation lag for one or more covariates. Ignored if adapt=FALSE for a certain covariate.

- min.width: a named list. Each component of the list must refer to one response variable and contain a named vector, including the minimum lag width for one or more covariates. Ignored if adapt=FALSE for a certain covariate.

- sign: a named list. Each component of the list must refer to one response variable and contain a named vector, including the sign (either '+' for non-negative, or '-' for non-positive) of the coefficients of one or more covariates. Ignored if adapt=FALSE for a certain covariate.

Variables appearing in the model code but not included in the dataset will be considered as unobserved. If there is at least one unobserved variable, imputation using EM will be performed whatever the value of argument imputation.

References

A. Magrini, F. Bartolini, A. Coli, and B. Pacini (2016). Distributed-Lag Structural Equation Modelling: An Application to Impact Assessment of Research Activity on European Agriculture. Proceedings of the 48th Meeting of the Italian Statistical Society, 8-10 June 2016, Salerno, IT.

W. K. Newey, and K. D. West (1978). A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55(3), 703-708.

Examples

Run this code

data(agres)

# estimation without control options
mycode <- list(
  GVA~quec(NPATENT,1,15),
  PPI~quec(NPATENT,0,13)+quec(GVA,0,14),
  ENTR_INCOME~quec(NPATENT,0,14)+quec(GVA,1,14)
  )
myfit <- dlsem(mycode,group="COUNTRY",exogenous=c("GDP","FARM_SIZE"),data=agres,
  uniroot.check=TRUE,imputation=FALSE,log=TRUE)


### adaptation of lag shapes (may take some seconds more)
## model code
#mycode <- list(
#  GVA~quec(NPATENT,0,15),
#  PPI~quec(NPATENT,0,15)+quec(GVA,0,15),
#  ENTR_INCOME~quec(NPATENT,0,15)+quec(GVA,0,15)
#  )
#
## control options
#mycontrol <- list(
#  adapt=c(GVA=TRUE,PPI=TRUE,ENTR_INCOME=TRUE),
#  max.gestation=list(GVA=c(NPATENT=3),PPI=c(NPATENT=3,GVA=3),ENTR_INCOME=c(NPATENT=3,GVA=3)),
#  min.width=list(GVA=c(NPATENT=5),PPI=c(NPATENT=5,GVA=5),ENTR_INCOME=c(NPATENT=5,GVA=5)),
#  sign=list(GVA=c(NPATENT="+"),PPI=c(NPATENT="-",GVA="-"),ENTR_INCOME=c(NPATENT="+",GVA="+"))
#  )
#
#myfit <- dlsem(mycode,group="COUNTRY",exogenous=c("GDP","FARM_SIZE"),data=agres,
#  control=mycontrol,uniroot.check=TRUE,imputation=TRUE,log=TRUE)


# summaries of estimation
summary(myfit)

# display the DAG with significant edges only
plot(myfit)

Run the code above in your browser using DataLab