condfstat: Compute conditional F statistic for weak instruments in an IV-estimation with multiple endogenous variables

Description

When using multiple instruments for multiple endogenous variables, the ordinary individual t-tests for the instruments in the first stage do not always reveal a weak set of instruments. Conditional F statistics can be used for such testing.

Usage

condfstat(object, type = "default", quantiles = 0, bN = 100L)

Arguments

object

object of class "felm", a result of a call to felm.

type

character. Error structure. Passed to waldtest. If NULL, both iid and robust Fs are returned.

quantiles

numeric. Quantiles for bootstrap.

integer. Number of bootstrap samples.

Value

A p x k matrix, where k is the number of endogenous variables. Each row are the conditional F statistics on a residual equation as described in Sanderson and Windmeijer (2014), for a certain error structure. The default is to use iid, or cluster if a cluster was specified to felm. The third choice is 'robust', for heteroskedastic errors. If type=NULL, iid and robust Fs are returned, and cluster, if that was specified to felm.

Note that for these F statistics it is not the p-value that matters, it is the F statistic itself which (coincidentally) pops up in the denominator for the asymptotic bias of the IV estimates, and thus a large F is beneficial.

Details

IV coefficient estimates are not normally distributed, in particular they do not have the right expectation. They follow a quite complicated distribution which is fairly close to normal if the instruments are good. The conditional F-statistic is a measure of how good the instruments are. If the F is large, the instruments are good, and any bias due to the instruments is small compared to the estimated standard errors, and also small relative to the bias in OLS. See Sanderson and Windmeijer (2014) and Stock and Yogo (2004). If F is small, the bias can be large compared to the standard error.

If any(quantiles > 0.0), a bootstrap with bN samples will be performed to estimate quantiles of the endogenous parameters which includes the variance both from the 1st and 2nd stage. The result is returned in an array attribute quantiles of the value returned by condfstat. The argument quantiles can be a vector to estimate more than one quantile at once. If quantiles=NULL, the bootstrapped estimates themselves are returned. The bootstrap is normally much faster than running felm over and over again. This is so because all exogenous variables are projected out of the equations before doing the bootstrap.

References

Sanderson, E. and F. Windmeijer (2014) A weak instrument F-test in linear IV models with multiple endogenous variables, Journal of Econometrics, 2015. http://www.sciencedirect.com/science/article/pii/S0304407615001736

Stock, J.H. and M. Yogo (2004) Testing for weak instruments in linear IV regression, http://ssrn.com/abstract=1734933 in Identification and inference for econometric models: Essays in honor of Thomas Rothenberg, 2005.

Examples

Run this code

# NOT RUN {
z1 <- rnorm(4000)
z2 <- rnorm(length(z1))
u <- rnorm(length(z1))
# make x1, x2 correlated with errors u

x1 <- z1 + z2 + 0.2*u + rnorm(length(z1))
x2 <- z1 + 0.94*z2 - 0.3*u + rnorm(length(z1))
y <- x1 + x2 + u
est <- felm(y ~ 1 | 0 | (x1 | x2 ~ z1 + z2))
summary(est)
# }
# NOT RUN {
summary(est$stage1, lhs='x1')
summary(est$stage1, lhs='x2')
# }
# NOT RUN {
# the joint significance of the instruments in both the first stages are ok:
t(sapply(est$stage1$lhs, function(lh) waldtest(est$stage1, ~z1|z2, lhs=lh)))
# everything above looks fine, t-tests for instruments, 
# as well as F-tests for excluded instruments in the 1st stages.
# The conditional F-test reveals that the instruments are jointly weak
# (it's close to being only one instrument, z1+z2, for both x1 and x2)
condfstat(est, quantiles=c(0.05, 0.95))

# }

Run the code above in your browser using DataLab