panelAR: Estimation of Linear AR(1) Panel Data Models with Cross-Sectional Heteroskedasticity and/or Correlation

Description

The function estimates linear models on panel data structures in the presence of AR(1)-type autocorrelation as well as panel heteroskedasticity and/or contemporaneous correlation. First, AR(1)-type autocorrelation is addressed via a two-step Prais-Winsten feasible generalized least squares (FGLS) procedure, where the autocorrelation coefficients may be panel-specific. Subsequently, one can choose to implement ‘sandwich’-type robust standard errors with OLS, panel weighted least squares (WLS), panel-corrected standard errors (PCSEs), or the Parks-Kme4nta FGLS estimator.

Usage

panelAR(formula, data, panelVar, timeVar, autoCorr = c("ar1",  "none", "psar1"), panelCorrMethod = c("none","phet","pcse","pwls", "parks"), rhotype ="breg", bound.rho = FALSE, rho.na.rm = FALSE,  panel.weight = c("t-1", "t"), dof.correction = FALSE,  complete.case = FALSE, seq.times = FALSE, singular.ok=TRUE)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

data

a data frame containing the variables in the model, as well as a variables defining the units and time.

panelVar

the column name of data that contains the panel ID. It cannot contain any NAs. May be set to NULL, in which case all observations are assumed to belong to the same unit.

timeVar

the column of data that contains the time ID. It must be a vector of integers and cannot contain any NAs. Duplicate time observations per panel are not allowed. At least two time periods are required.

autoCorr

character string denoting structure of autocorrelation in the data: ar1 denotes AR(1)-type autocorrelation with a common correlation coefficient across all panels, psar1 denotes AR(1)-type autocorrelation with a unique correlation coefficient for each panel, and none denotes no autocorrelation. Default: ar1.

panelCorrMethod

character string denoting method used for dealing with panel heteroskedasticity and/or correlation. none denotes homoskedasticity and no correlation across panels, phet denotes a Huber-White style sandwich estimator for panel heteroskedasticity, pcse denotes panel-corrected standard errors that are robust to both heteroskedasticity and contemporaneous correlation across panels, pwls denotes that a panel weighted least squares procedure is to deal with panel heteroskedasticity, and parks means that Parks-Kmenta FGLS is used to estimate both panel heteroskedasticity and correlation. Default: none.

rhotype

character string denoting method used for estimating autocorrelation coefficient, $\rho$. Possible options are breg, scorr, freg, theil, dw, and theil-nagar. See ‘Details’. Default: breg.

bound.rho

logical. If TRUE, the panel-specific autocorrelation coefficient $\rho_i$ is bounded to $[-1,1]$ in the calculation of $\rho$; used only for autoCorr="ar1". Default: TRUE.

rho.na.rm

logical. If FALSE and $\rho_i$ cannot be calculated for a panel, function returns error. If TRUE, $\rho_i$s that are NA are ignored if calculating a common AR(1) coefficient or set to 0 if calculating panel-specific AR(1) coefficients. Default: FALSE.

panel.weight

the weight to be used for each panel when combining panel-specific autocorrelations $\rho_i$ to a common $\rho$. Weight is either the number of time periods in the corresponding panel (t) or the number of time periods minus 1 (t-1). Default: t.

dof.correction

logical. If TRUE, standard errors are adjusted by a factor of $N/(N-k)$, where $N$ is total number of observations and k is the rank of the linear model. Default: FALSE.

complete.case

logical. If TRUE, use only the time periods where every panel has a valid observation in the estimation of PCSEs or the Parks-Kmenta estimator. Otherwise, use pairwise procedure. Default: FALSE.

seq.times

logical. If TRUE, observations are temporally ordered by panel and assigned a sequential time variable that ignores any gaps in the runs. Default: FALSE.

singular.ok

logical. If FALSE, a singular failure results in an error. Default: TRUE.

Value

coefficients: the named vector of coefficients.
residuals: the residuals.
fitted.values: the fitted mean values.
rank: the numeric rank of the fitted linear model.
df.residual: the residual degrees of freedom.
call: the matched call.
terms: the terms object used.
model: the model frame used.
aliased: named logical vector designating if original coefficients are aliased.
na.action: information returned by model.frame in the handling of NAs.
vcov: estimated variance-covariance matrix of coefficients.
r2: $R^2$ based on quasi-differenced data from the Prais-Winsten regression. Set to NULL if PWLS or Parks-Kmenta procedures are used.
panelStructure: a list of several objects which contain information on the panel structure of the data. See details below.
obs.mat: logical matrix of dimension $N_p \times T$, where $N_p$ is the number of panels. If cell value is TRUE, panel $i$ at time $t$ has a valid observation. Panel structure is balanced if entire matrix is TRUE.
rho: autocorrelation parameters. Scalar if "ar1" option was used, vector of length $N_p$ (number of panels) if "psar1" option was used, and NULL if "none" option was used.
Sigma: $N_p \times N_p$ matrix of estimated panel covariances.
N.cov: number of panel covariances estimated.

Details

Function for running two-step Prais-Winsten models on panel data that exhibit AR(1)-type autocorrelation. Following the two-step estimation, one can choose to use a ‘sandwich’-type robust standard error estimator with OLS or a panel weighted least squares estimator to address panel heteroskedasticity. Alternatively, if panels are both heteroskedastic and contemporaneously correlated, the package supports panel-corrected standard errors (PCSEs) as well as the Parks-Kmenta FGLS estimator. Note that the Parks-Kmenta estimator should ideally be reserved for use only when the number of time periods is significantly greater than the number of panels (see Beck and Katz). The function is robust to unbalanced panel structures, panels with just one observation, multiple runs per panel, and the presence of panels without any overlapping observations.

While generally designed to estimate Prais-Winsten models on panel data, setting panelVar to NULL will estimate an AR(1) time-series model treating the entire dataset as one unit. In this case, the panelCorrMethod is ignored since equal variances are assumed across all observations.

A number of common estimators for the autocorrelation coefficient are supported. Specifically:

breg: Linear regression estimator: $\hat{\rho}_{breg} = \frac{\sum_{t=2}^{T_i} \hat{\epsilon}_{i,t}\hat{\epsilon}_{i,t-1}}{\sum_{t=1}^{T_i-1} \hat{\epsilon}_{i,t}^2}$

scorr

Sample correlation coefficient estimator: $\hat{\rho}_{scorr} = \frac{\sum_{t=2}^{T_i} \hat{\epsilon}_{i,t}\hat{\epsilon}_{i,t-1}}{\sum_{t=1}^{T_i} \hat{\epsilon}_{i,t}^2}$

freg

Forward linear regression estimator: $\hat{\rho}_{freg} = \frac{\sum_{t=1}^{T_i-1} \hat{\epsilon}_{i,t}\hat{\epsilon}_{i,t+1}}{\sum_{t=1}^{T_i-1} \hat{\epsilon}_{i,t+1}^2}$

theil

Theil estimator: $\hat{\rho}_{theil} = \hat{\rho}_{scorr} \frac{T_i-k}{T_i-1}$

dw

Durbin-Watson estimator: $\hat{\rho}_{dw} = 1-\frac{1}{2} \frac{\sum_{t=2}^{T_i} (\hat{\epsilon}_{i,t}-\hat{\epsilon}_{i,t-1})^2}{\sum_{t=1}^{T_i} \hat{\epsilon}_{i,t}^2}$

theil-nagar

Theil-Nagar estimator: $\hat{\rho}_{theil-nagar} = \frac{T_i^2 \hat{\rho}_{dw} + k^2}{T_i^2-k^2}$

In the expressions above, $\hat{\epsilon}$ denotes observed residuals from the first stage OLS regression, $T_i$ is the number of observations in panel $i$, and $k$ is the rank of the model matrix. Some of these estimators cannot be calculated for panels with one observation or multiple runs of one observation. In these cases, rho.na.rm controls the treatment of these autocorrelation coefficients. If TRUE, ignore panel-specific autocorrelation coefficients for panels where $\rho_i$ returns NA if calculating a common AR(1) coefficient, and set them to 0 if calculating panel-specific AR(1) coefficients.

If PCSEs or the Parks-Kmenta estimator are selected, the default is to use all pairwise observations to estimate the time-constant covariances across units. In the case of no overlapping observations between panels, the panel covariance is assumed to be 0. If complete.case is set to TRUE, then only the time periods where every panel has a valid observation are used for the calculation of the contemporaneous correlation matrix.

References

Beck, Nathaniel and Jonathan N. Katz. 1995. “What to do (and not to do) with time-series cross-section data.” Am. Polit. Sci. Rev. 89:634-47.

Greene, William H. 2012. Econometric Analysis. 7ed. Prentice Hall.

Judge, George G., William E. Griffiths, R. Carter Hill, Helmut Lütkepohl, and Tsoung-Chao Lee. 1985. The Theory and Practice of Econometrics. 2ed. John Wiley & Sons.

Prais, S., and C. Winsten. 1954. “Trend Estimation and Serial Correlation.” Cowles Commission Discussion Paper No. 383, Chicago.

Examples

Run this code

# Common AR(1) with PCSE
data(Rehm)
out <- panelAR(NURR ~ gini + mean_ur + selfemp + cum_right + tradeunion + deficit + 
tradeopen + gdp_growth, data=Rehm, panelVar='ccode', timeVar='year', autoCorr='ar1', 
panelCorrMethod='pcse', rho.na.rm=TRUE, panel.weight='t-1', bound.rho=TRUE)
summary(out)

# Panel-specific AR(1) with PCSE
data(WhittenWilliams)
# expect warning urging to use 'complete.case=FALSE' 
out2 <- panelAR(milex_gdp~lag_milex_gdp+GOV_rl+gthreat+GOV_min+GOV_npty+election_yr+
lag_real_GDP_gr+cinclag+lag_alliance+lag_cinc_ratio+lag_us_change_milex_gdp, 
data=WhittenWilliams, panelVar="ccode", timeVar="year", autoCorr="psar1", 
panelCorrMethod="pcse", complete.case=TRUE) 
summary(out2)
summary(out2)$rho

# Panel-specific AR(1) correlation with PWLS	
data(BrooksKurtz)
out3 <- panelAR(kaopen ~ ldiffpeer + ldiffisi + ldiffgrowth + ldiffinflation + 
ldiffneg + ldiffembi + limf + isi_objective + partisan + checks +  lusffr + 
linflation + lbankra + lcab + lgrowth +  ltradebalance + lngdpcap + lngdp + 
brk + timetrend + y1995, data=BrooksKurtz, panelVar='country', timeVar='year', 
autoCorr='psar1', panelCorrMethod='pwls',rho.na.rm=TRUE, panel.weight='t', 
seq.times=TRUE)
summary(out3)

Run the code above in your browser using DataLab