Learn R Programming

np (version 0.40-13)

npregiv: Nonparametric Instrumental Regression

Description

npregiv computes nonparametric estimation of an instrumental regression function $\varphi$ defined by conditional moment restrictions stemming from a structural econometric model: $E [Y - \varphi (Z,X) | W ] = 0$, and involving endogenous variables $Y$ and $Z$ and exogenous variables $X$ and instruments $W$. The function $\varphi$ is the solution of an ill-posed inverse problem. When method="Tikhonov", npregiv uses the approach of Darolles, Fan, Florens and Renault (2011) modified for local polynomial kernel regression of any order (Darolles et al use local constant kernel weighting which corresponds to setting p=0; see below for details). When method="Landweber-Fridman", npregiv uses the approach of Horowitz (2011) again using local polynomial kernel regression (Horowitz uses B-spline weighting).

Usage

npregiv(y,
        z,
        w,
        x = NULL,
        zeval = NULL,
        weval = NULL,
        xeval = NULL,
        p = 1,
        nmulti = 1,
        random.seed = 42,
        optim.maxattempts = 10,
        optim.method = c("Nelder-Mead", "BFGS", "CG"),
        optim.reltol = sqrt(.Machine$double.eps),
        optim.abstol = .Machine$double.eps,
        optim.maxit = 500,
        alpha = NULL,
        alpha.min = 1e-10,
        alpha.max = 1e-01,
        alpha.tol = .Machine$double.eps^0.25,
        iterate.max = 1000,
        iterate.tol = 1.0e-04,
        iterate.diff.tol = 1.0e-08,
        constant = 0.5,
        method = c("Landweber-Fridman","Tikhonov"),
        stop.on.increase = TRUE,
        ...)

Arguments

y
a one (1) dimensional numeric or integer vector of dependent data, each element $i$ corresponding to each observation (row) $i$ of z.
z
a $p$-variate data frame of endogenous regressors. The data types may be continuous, discrete (unordered and ordered factors), or some combination thereof.
w
a $q$-variate data frame of instruments. The data types may be continuous, discrete (unordered and ordered factors), or some combination thereof.
x
an $r$-variate data frame of exogenous regressors. The data types may be continuous, discrete (unordered and ordered factors), or some combination thereof.
zeval
a $p$-variate data frame of endogenous regressors on which the regression will be estimated (evaluation data). By default, evaluation takes place on the data provided by z.
weval
a $q$-variate data frame of instruments on which the regression will be estimated (evaluation data). By default, evaluation takes place on the data provided by w.
xeval
an $r$-variate data frame of exogenous regressors on which the regression will be estimated (evaluation data). By default, evaluation takes place on the data provided by x.
p
the order of the local polynomial regression (defaults to p=1, i.e. local linear).
nmulti
integer number of times to restart the process of finding extrema of the cross-validation function from different (random) initial points.
random.seed
an integer used to seed R's random number generator. This ensures replicability of the numerical search. Defaults to 42.
optim.method
method used by optim for minimization of the objective function. See ?optim for references. Defaults to "Nelder-Mead". the default method is an implementation of tha
optim.maxattempts
maximum number of attempts taken trying to achieve successful convergence in optim. Defaults to 100.
optim.abstol
the absolute convergence tolerance used by optim. Only useful for non-negative functions, as a tolerance for reaching zero. Defaults to .Machine$double.eps.
optim.reltol
relative convergence tolerance used by optim. The algorithm stops if it is unable to reduce the value by a factor of 'reltol * (abs(val) + reltol)' at a step. Defaults to sqrt(.Machine$
optim.maxit
maximum number of iterations used by optim. Defaults to 500.
alpha
a numeric scalar that, if supplied, is used rather than numerically solving for alpha, when using method="Tikhonov".
alpha.min
minimum of search range for $\alpha$, the Tikhonov regularization parameter, when using method="Tikhonov".
alpha.max
maximum of search range for $\alpha$, the Tikhonov regularization parameter, when using method="Tikhonov".
alpha.tol
the search tolerance for optimize when solving for $\alpha$, the Tikhonov regularization parameter, when using when using method="Tikhonov".
iterate.max
an integer indicating the maximum number of iterations permitted before termination occurs when using method="Landweber-Fridman".
iterate.tol
the search tolerance for the stopping rule when using method="Landweber-Fridman".
iterate.diff.tol
the search tolerance for the difference in the stopping rule from iteration to iteration when using method="Landweber-Fridman" (disable by setting to zero).
constant
the constant to use when using method="Landweber-Fridman".
method
the regularization method employed (defaults to "Landweber-Fridman", see Horowitz (2011); see Darolles, Fan, Florens and Renault (2011) for details for "Tikhonov").
stop.on.increase
a logical value (defaults to TRUE) indicating whether to halt iteration if the stopping criterion (see below) increases over the course of one iteration (i.e. it may be above the iteration tolerance but increased).
...
additional arguments supplied to npksum.

Value

  • npregiv returns a list with components phi and either alpha when method="Tikhonov" or num.iterations and norm.stop when method="Landweber-Fridman".

Details

Tikhonov regularization requires computation of weight matrices of dimension $n\times n$ which can be computationally costly in terms of memory requirements and may be unsuitable for large datasets. Landweber-Fridman will be preferred in such settings as it does not require construction and storage of these weight matrices while it also avoids the need for numerical optimization methods to determine $\alpha$. method="Landweber-Fridman" uses an optimal stopping rule based upon $||E(y|w)-E(\varphi_k(z,x)|w)||^2$. However, if insufficient training is conducted the estimates can be overly noisy. To best guard against this eventuality set nmulti to a larger number than the default nmulti=0 for npreg. When using method="Landweber-Fridman", iteration will terminate when either the change in the value of $||(E(y|w)-E(\varphi_k(z,x)|w))/E(y|w)||^2$ from iteration to iteration is less than iterate.tol or we hit iterate.max or $||E((y|w)-E(\varphi_k(z,x)|w))/E(y|w)||^2$ stops falling in value and starts rising.

References

Carrasco, M. and J.P. Florens and E. Renault (2007), Linear Inverse Problems in Structural Econometrics Estimation Based on Spectral Decomposition and Regularization, In: James J. Heckman and Edward E. Leamer, Editor(s), Handbook of Econometrics, Elsevier, 2007, Volume 6, Part 2, Chapter 77, Pages 5633-5751 Darolles, S. and Y. Fan and J.P. Florens and E. Renault (2011), Nonparametric Instrumental Regression, Econometrica, 79, 1541-1565. Feve, F. and J.P. Florens (2010), The practice of non-parametric estimation by solving inverse problems: the example of transformation models, Econometrics Journal, 13, S1-S27. Florens, J.P. and J.S. Racine (2012), Nonparametric Instrumental Derivatives, Working Paper. Fridman, V. M. (1956), A Method of Successive Approximations for Fredholm Integral Equations of the First Kind, Uspeskhi, Math. Nauk., 11, 233-334, in Russian. Horowitz, J.L. (2011), Applied Nonparametric Instrumental Variables Estimation,, Econometrica, 79, 347-394. Landweber, L. (1951), An iterative formula for Fredholm integral equations of the first kind, American Journal of Mathematics, 73, 615-24. Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press. Li, Q. and J.S. Racine (2004), Cross-validated local linear nonparametric regression, Statistica Sinica, 14, 485-512.

See Also

npregivderiv,npreg

Examples

Run this code
## This illustration was made possible by Samuele Centorrino
## <samuele.centorrino@univ-tlse1.fr>

set.seed(42)
n <- 1500

## The DGP is as follows:

## 1) y = phi(z) + u

## 2) E(u|z) != 0 (endogeneity present)

## 3) Suppose there exists an instrument w such that z = f(w) + v and
## E(u|w) = 0

## 4) We generate v, w, and generate u such that u and z are
## correlated. To achieve this we express u as a function of v (i.e. u =
## gamma v + eps)

v <- rnorm(n,mean=0,sd=0.27)
eps <- rnorm(n,mean=0,sd=0.05)
u <- -0.5*v + eps
w <- rnorm(n,mean=0,sd=1)

## In Darolles et al (2011) there exist two DGPs. The first is
## phi(z)=z^2 and the second is phi(z)=exp(-abs(z)) (which is
## discontinuous and has a kink at zero).

fun1 <- function(z) { z^2 }
fun2 <- function(z) { exp(-abs(z)) }

z <- 0.2*w + v

## Generate two y vectors for each function.

y1 <- fun1(z) + u
y2 <- fun2(z) + u

## You set y to be either y1 or y2 (ditto for phi) depending on which
## DGP you are considering:

y <- y1
phi <- fun1

## Sort on z (for plotting)

ivdata <- data.frame(y,z,w)
ivdata <- ivdata[order(ivdata$z),]
rm(y,z,w)
attach(ivdata)

model.iv <- npregiv(y=y,z=z,w=w)
phi.iv <- model.iv$phi

## Now the non-iv local linear estimator of E(y|z)

ll.mean <- fitted(npreg(y~z,regtype="ll"))

## For the plots, restrict focal attention to the bulk of the data
## (i.e. for the plotting area trim out 1/4 of one percent from each
## tail of y and z)

trim <- 0.0025

curve(phi,min(z),max(z),
      xlim=quantile(z,c(trim,1-trim)),
      ylim=quantile(y,c(trim,1-trim)),
      ylab="Y",
      xlab="Z",
      main="Nonparametric Instrumental Kernel Regression",
      lwd=2,lty=1)

points(z,y,type="p",cex=.25,col="grey")

lines(z,phi.iv,col="blue",lwd=2,lty=2)

lines(z,ll.mean,col="red",lwd=2,lty=4)

legend(quantile(z,trim),quantile(y,1-trim),
       c(expression(paste(varphi(z))),
         expression(paste("Nonparametric ",hat(varphi)(z))),
         "Nonparametric E(y|z)"),
       lty=c(1,2,4),
       col=c("black","blue","red"),
       lwd=c(2,2,2))

Run the code above in your browser using DataLab