dfldens: Counterfactual Kernel Density Functions

Description

Uses the DiNardo, Fortin, and Lemieux approach to re-weight kernel density functions based on values of an explanatory variable from an earlier period.

Usage

dfldens(y,lgtform,window=0,bandwidth=0,kern="tcub",probit=FALSE, graph=TRUE,yname="y",alldata=FALSE,data=NULL)

Arguments

The dependent variable for which the counterfactual density is estimated. The data frame must be specified if it has not been attached, e.g., y=mydata$depvar.

lgtform

The formula for the logit or probit model for the time variable. The dependent variable should be a 0-1 variable with 1's representing the later time period. Example: lgtform=timevar~x1+x2.

window

The window size for the kernel density function. Default: not used.

bandwidth

The bandwidth. Default: bandwidth = (.9*(quantile(y1,.75)-quantile(y1,.25))/1.34)*(n1^(-.20)), specified by setting bandwidth = 0 and window = 0.

kern

Kernel weighting function. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss".

probit

If TRUE, a probit model is used for the time variable rather than logit. Default: probit = FALSE.

graph

If TRUE, produces a graph showing the density function for time 1 and the counterfactual density. Default: graph=TRUE.

yname

The name to be used for the variable whose density functions are drawn when graph=T. Default: yname = "y".

alldata

If TRUE, the density functions are calculated using each observation in turn as a target value. When alldata=F, densities are calculated at a set of points chosen by the locfit program using an adaptive decision tree approach, and the smooth12 command is used to interpolate to the full set of observations.

data

A data frame with the variables for the logit or probit model specified by lgtform. Note: the data frame for y must be specified even if it is part of data.

Value

target: The vector of target values for y for the density functions.
dtarget1: The vector of densities in period 1 at the target values of y.
dtarget10: The counterfactual densities in period 1 at the target values of y.
dhat1: The vector of densities in period 1 at the actual values of y.
dhat10: The counterfactual densities in period 1 at the actual values of y.

Details

The dfldens command first calculates kernel density estimates for y in time period timevar = 1. The density estimate at target point y is $f(y_1) = (1/(hn_1)) \sum_i K((y_{1i} - y_1)/h)$. The following kernel weighting functions are available:

Kernel Call abbreviation Kernel function K(z) Rectangular ``rect'' $1/2 * I(|z|<1)$ <="" td=""> Triangular ``tria'' $(1-|z|) * I(|z|<1)$< td=""> Epanechnikov ``epan'' $3/4 * (1-z^2)*I(|z| < 1)$ Bi-Square ``bisq'' $15/16 * (1-z^2)^2 * I(|z| < 1)$ Tri-Cube ``tcub'' $70/81 * (1-|z|^3)^3 * I(|z| < 1)$ Tri-Weight ``trwt'' $35/32 * (1-z^2)^3 * I(|z| < 1)$ Gaussian ``gauss'' $2pi^{-.5} exp(-z^2/2)$

By default, dfldens uses a tri-cube kernel with a fixed bandwidth of h = (.9*(quantile(y1,.75)-quantile(y1,.25))/1.34)*(n1^(-.20)). The results are stored in dtarget1 and dhat1.

The counterfactual density is an estimate of the density function for y in time 1 if the explanatory variables listed in lgtform were equal to their time 0 values. DiNardo, Fortin, and Lemieux (1996) show that the the following re-weighting of $f(y_1)$ is an estimate of the counterfactual density: $(1/(hn_1)) \sum_i \tau_i K((y_{1i} - y_1)/h)$. The weights are given by $tau_i = (P(x_i)/(1-P(x_i)))/(p/(1-p)) $, where $p = n_0/(n_0 + n_1))$ and $P(x_i))$ is the estimated probability that timevar = 0 from the estimated logit or probit regression of timevar on X.

If X includes a single variable x, the counterfactual density shows how the $f(y_1)$ would change if $x = x_0$ rather than $x_1$. Alternatively, X can include multiple variables, in which case the counterfactual density shows how the $f(y_1)$ would change if all of the variables in X were equal to their timevar = 0 values.

References

DiNardo, J., N. Fortin, and T. Lemieux, "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semi-Parametric Approach," Econometrica 64 (1996), 1001-1044.

Leibbrandt, Murray, James A. Levinsohn, and Justin McCrary, "Incomes in South Africa after the Fall of Apartheid," Journal of Globalization and Development 1 (2010).

Examples

Run this code

data(matchdata)
matchdata$year05 <- matchdata$year==2005
fit <- dfldens(matchdata$lnprice, year05~lnland+lnbldg, window=.2, 
  yname = "Log of Sale Price", data=matchdata)
matchdata$age <- matchdata$year - matchdata$yrbuilt
fit <- dfldens(matchdata$lnprice, year05~age, window=.2, 
  yname="Log of Sale Price", data=matchdata)

Run the code above in your browser using DataLab