dlsem: Parameter estimation

Description

Parameter estimation for a distributed-lag linear structural equation model.

Usage

dlsem(model.code, group = NULL, exogenous = NULL, data, log = FALSE,
  diff.options = list(test="adf",combine="choi",k=0,lshort=TRUE,maxdiff=3),
  imput.options = list(tol=0.0001,maxiter=500,no.imput=NULL),
  global.control = NULL, local.control = NULL)

Arguments

model.code

A list of objects of class formula, each describing a single regression model. See Details.

group

The name of the group factor (optional). If NULL, no groups are considered.

exogenous

The name of exogenous variables (optional). Exogenous variables can be either quantitative or qualitative, must appear in no regression model, and are not lagged.

data

An object of class data.frame containing data.

log

Logical. If TRUE, logarithmic transformation is applied to strictly positive quantitative variables. Default is FALSE.

diff.options

A list containing options for the differentiation. The list may consist of any number of components among the following:

test: the unit root test to use, that can be either "adf" or "kpss" (see unirootTest). Default is "adf";
combine: the method to combine p-values of different groups, that can be either "choi" or "demetrescu" (see unirootTest) Ignored if group is NULL. Default is "choi";
k: the lag order to calculate the statistic of the Augmented Dickey-Fuller test. Ignored if test="kpss". Default is 0;
lshort: logical. If TRUE, the short version of the truncation lag parameter is used for the KPSS test. Ignored if test="adf". Default is TRUE;
maxdiff: the maximum differentiation order to apply. If maxdiff=0, differentiation will not be applied. Default is 3.

imput.options

A list containing options for the imputation of missing values. The list may consist of any number of components among the following:

tol: the tolerance threshold of the EM algorithm. Default is 0.0001;
maxiter: the maximum number of iterations for the EM algorithm. Default is 500. If maxiter=0, imputation will not be performed;
no.input: the name of variables to which imputation will not be applied.

global.control

A list containing global options for the estimation. The list may consist of any number of components among the following:

adapt: a logical value indicating if adaptation of lag shapes must be performed. Default is FALSE;
max.gestation: the maximum gestation lag for one or more covariates. If not provided, it is taken as equal to max.lead (see below). Ignored if adapt=FALSE;
max.lead: the maximum lead lag. If not provided, it is computed accordingly to the sample size. Ignored if adapt=FALSE;
min.width: the minimum lag width. It cannot be greater than max.lead. If not provided, it is taken as 0. Ignored if adapt=FALSE;
sign: the sign (either '+' for non-negative, or '-' for non-positive) of the coefficients. If not provided, adaptation will disregard the sign of coefficients. Ignored if adapt=FALSE;
selection: the criterion to be used for the adaptation of lag shapes, that can be one among "aic" to minimise the Akaike Information Criterion (Akaike, 1974), "bic" to minimise the Bayesian Information Criterion (Schwarz, 1978), and "mdl" to minimise the Minimum Description Length (Rissanen, 1978). Default is "aic".

local.control

A list containing variable-specific options for the estimation. These options prevail on the ones contained in global.control. See Details.

Value

An object of class dlsem, with the following components:

estimate

A list of objects of class lm, one for each response variable.

model.code

The model code after eventual adaptation.

exogenous

The names of exogenous variables.

group

The name of the group factor. NULL is returned if group=NULL.

log

The value provided to argument log.

ndiff

The order of differentiation.

diff.options

Options used for the differentiation.

imput.options

Options used for the imputation of missing values.

selection

The criterion used for the adaptation of lag shapes.

adaptation

Variable-specific options used for the adaptation of lag shapes.

data.orig

The dataset provided to argument data.

data.used

Data used in the estimation, that is after eventual logarithmic transformation and differentiation.

S3 methods available for class dlsem are:

provides essential information on the structural model.

summary

shows summaries of estimation.

plot

displays the directed acyclic graph. Option conf controls the confidence level (default is 0.95), while option style controls the style of the plot:

style=2 (the default): each edge is coloured with respect to the sign of the estimated causal effect (green: positive, red: negative, light grey: not statistically significant);
style=1: edges with statistically significant causal effect are shown in black, otherwise they are shown in light grey;
style=0: all edges are shown in black disregarding statistical significance of causal effects.

residuals

returns residuals.

Details

Formulas cannot contain qualitative variables or interaction terms (no ':' or '*' symbols), and may contain the following operators for lag specification:

quec: quadratic (2nd order polynomial) lag shape with endpoint constraints;
qdec: quadratic (2nd order polynomial) decreasing lag shape;
gamma: gamma lag shape.

Each operator must have the following three arguments (provided within brackets):

the name of the covariate to which the lag is applied;
the minimum lag with a non-zero coefficient (for 2nd order polynomial lag shapes), or the delta parameter (for the gamma lag shape);
the maximum lag with a non-zero coefficient (for 2nd order polynomial lag shapes), or the lambda parameter (for the gamma lag shape).

For example, quec(X1,3,15) indicates that a quadratic lag shape with endpoint constraints must be applied to variable X1 in the interval (3,15), and gamma(X1,0.75,0.8) indicates that a gamma lag shape with delta=0.75 and lambda=0.8 must be applied to variable X1. See Judge et al. (1985, Chapters 9-10) for more details.

The formula of regression models with no covariates excepting exogenous variables can be omitted from argument model.code. The group factor and exogenous variables must not appear in any formula.

Argument local.control must be a named list containing one or more among the following components:

adapt: a named vector of logical values indicating if adaptation of lag shapes must be performed for one or more response variables. Default is FALSE for all response variables.
max.gestation: a named list. Each component of the list must refer to one response variable and contain a named vector, including the maximum gestation lag for one or more covariates. If not provided, it is taken as equal to max.lead (see below). Ignored if adapt=FALSE for a certain covariate.
max.lead: a named list. Each component of the list must refer to one response variable and contain a named vector, including the maximum lead lag for one or more covariates. If not provided, it is computed accordingly to the sample size. Ignored if adapt=FALSE for a certain covariate.
min.width: a named list. Each component of the list must refer to one response variable and contain a named vector, including the minimum lag width for one or more covariates. It cannot be greater than max.lead. If not provided, it is taken as 0. Ignored if adapt=FALSE for a certain covariate.
sign: a named list. Each component of the list must refer to one response variable and contain a named vector, including the sign (either '+' for non-negative, or '-' for non-positive) of the coefficients of one or more covariates. If not provided, adaptation will disregard the sign of coefficients. Ignored if adapt=FALSE for a certain covariate.

If some local control options conflict with global ones, only the former are applied.

References

H. Akaike (1974). A New Look at the Statistical Identification Model. IEEE Transactions on Automatic Control, 19, 716-723. DOI: 10.1109/TAC.1974.1100705

G. G. Judge, W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T. C. Lee (1985). The Theory and Practice of Econometrics. John Wiley & Sons, 2nd ed., New York, US-NY. ISBN: 978-0-471-89530-5

J. Rissanen (1978). Modeling by Shortest Data Description. Automatica, 14(5): 465-658. DOI: 10.1016/0005-1098(78)90005-5

G. Schwarz (1978). Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464. DOI: 10.1214/aos/1176344136

Examples

Run this code

# NOT RUN {
data(industry)

# estimation without adaptation of lag shapes
mycode <- list(
  Consum~quec(Job,0,5),
  Pollution~quec(Job,1,8)+quec(Consum,1,6)
  )
myfit <- dlsem(mycode,group="Region",exogenous=c("Population","GDP"),data=industry,log=TRUE)


### adaptation of lag shapes (takes some seconds more)
#
#mycode <- list(
#  Consum~quec(Job,0,15),
#  Pollution~quec(Job,0,15)+quec(Consum,0,15)
#  )
#                      
#myfit <- dlsem(mycode,group="Region",exogenous=c("Population","GDP"),data=industry,
#  global.control=list(adapt=T,max.gestation=3,min.width=5,max.lead=15,sign="+"),log=TRUE)
#
### equivalently, one may specify control options for each variable:
#
#mycontrol <- list(
#  max.gestation=list(Consum=c(Job=3),Pollution=c(Consum=3,Job=3)),
#  max.lead=list(Consum=c(Job=15),Pollution=c(Consum=15,Job=15)),
#  min.width=list(Consum=c(Job=5),Pollution=c(Consum=5,Job=5)),
#  sign=list(Consum=c(Job="+"),Pollution=c(Consum="+",Job="+"))
#  )
#
#myfit <- dlsem(mycode,group="Region",exogenous=c("Population","GDP"),data=industry,
#  global.control=list(adapt=T),local.control=mycontrol,log=TRUE)


# add a qualitative exogenous variable
industry[,"Policy"] <- factor(1*(industry[,"Year"]>=2006))
levels(industry[,"Policy"]) <- c("no","yes")
myfit <- dlsem(mycode,group="Region",exogenous=c("Population","GDP","Policy"),
  data=industry,log=TRUE)
  
# summaries of estimation
summary(myfit)

# directed acyclic graph
plot(myfit)

# directed acyclic graph including only statistically significant edges
plot(myfit,show.ns=FALSE)
# }

Run the code above in your browser using DataLab