qregsim2: Machado-Mata Decomposition of Changes in Distributions

Description

Decomposes quantile regression estimates of changes in the distribution of a dependent variable into the components associated with changes in the distribution of the explanatory variables and the coefficient estimates.

Usage

qregsim2(formall, formx, dataframe1, dataframe2, bmat1, bmat2,   graphx=TRUE, graphb=TRUE, graphy=TRUE, graphdy=TRUE, nbarplot=10,  yname=NULL, xnames=NULL, timenames=c("1","2"), leglocx="topright",leglocy="topright",leglocdy="topright", nsim=20000, bwadjx=1,bwadjy=1,bwadjdy=1)

Arguments

formall

Model formula. Must match the model formula used for qregbmat.

formx

Model formula for the variables used for the decompositions, e.g., formx=~x1+x2. The coefficients and variables for the other variables are held at their time 2 values for the simulations.

dataframe1

The data frame for regime 1. Should include all the variables listed in formall.

dataframe2

The data frame for regime 2. Should include all the variables listed in formall.

bmat1

Matrix of values for regime 1 quantile coefficient matrices; the output from running qregbmat using dataframe1.

bmat2

Matrix of values for regime 2 quantile coefficient matrices; the output from running qregbmat using dataframe2.

graphx

If graphx=T, presents kernel density estimates of each of the explanatory variables in formx.

graphb

If graphb=T, presents graphs of the quantile coefficient estimates for the variables in formx.

graphy

If graphy=T, presents of the predicted values of y for time1, time2, and the counterfactual.

graphdy

If graphdy=T, presents graphs of the changes in densities.

nbarplot

Specifies the maximum number of values taken by an explanatory variable before bar plots are replaced by smooth kernel density functions. Only relevant when graphx = T.

yname

A label used for the dependent variable in the density graphs, e.g., yname = "Log of Sale Price".

xnames

Labels for graphs involving the explanatory variables, e.g., xnames = "x1" for one explanatory variable, or xnames = c("x1","x2") for two variables.

timenames

A vector with labels for the two regimes. Must be entered as a vector with character values. Default: c("1","2").

leglocx

Legend location for density plots of the explanatory variables, e.g., leglocx = "topright" for one explanatory variable, or leglocx = c("topright","topleft") for two variables.

leglocy

Legend location for density plots of predicted values of the dependent variable. Default: leglocy = "topright".

leglocdy

Legend location for plot of density changes. Default: leglocdy = "topright".

nsim

Number of simulations for the decompositions.

bwadjx

Factor used to adjust bandwidths for kernel density plots of the explanatory variables. Smoother functions are produced when bwadjust>1. Passed directly to the density function's adjust option. Default: bwadjx=1.

bwadjy

Factor used to adjust bandwidths for kernel density plots predicted values of the dependent variable.

bwadjdy

Factor used to adjust bandwidths for plots of the kernel density changes.

Value

ytarget: The values for the x-axis for the density functions.
yhat11: The kernel density function for $X_1 \beta_1 + Z_1\gamma_1$.
yhat22: The kernel density function for $X_2 \beta_2 + Z_2\gamma_2$.
yhat12: The kernel density function for $X_1 \beta_2 + Z_2\gamma_2$.
d2211: The difference between the density functions for $X_2 \beta_2 + Z_2\gamma_2$ and $X_1 \beta_1 + Z_1\gamma_1$. Will differ from yhat22 - yhat11 if bwadjy and bwadjdy are different.
d2212: The difference between the density functions for $X_2 \beta_2 + Z_2\gamma_2$ and $X_1 \beta_2 + Z_2\gamma_2$. Will differ from yhat22 - yhat12 if bwadjy and bwadjdy are different.
d1211: The difference between the density functions for $X_1 \beta_2 + Z_2\gamma_2$ and $X_1 \beta_1 + Z_1\gamma_1$. Will differ from yhat12 - yhat11 if bwadjy and bwadjdy are different.

Details

The base models are $y_1 = X_1\beta_1 + Z_1\gamma_1$ for regime 1 and $y_2 = X_2\beta_2 + Z_2\gamma_2$ for regime 2. The counterfactual model is $y_{12} = X_1\beta_2 + Z_2\gamma_2$. The full list of variable (both X and Z) are provided by form; this list must correspond exactly with the list provided to qregbmat. The subset of variables that are the subject of the decompositions are listed in formx.

The matrices bmat1 and bmat2 are intended to represent the output from qregbmat. The models must include the same set of explanatory variables, and the variables must be in the same order in both bmat1 and bmat2. In contrast, the data frames dataframe1 and dataframe2 can have different numbers of observations and different sets of explanatory, as long as they include the dependent variable and the variables listed in bmat1 and bmat2.

The output from qregsim2 is a series of graphs. If all options are specified, the graphs appear in the following order:

1. Kernel density estimates for each variable listed in formx. Estimated using density with default bandwidths and the specified value for bwadjx. Not shown if graphx=F. The xnames and leglocx options can be used to vary the names used to label the x-axis and the legend location.

2. Quantile coefficient estimates for the variables listed in formx. Not listed if graphb=F.

3. Kernel density estimates for the predicted values of $X_1\beta_1 + Z_1\gamma_1$ and $X_2\beta_2 + Z_2\gamma_2$, and the counterfactual, $X_1\beta_2 + Z_2\gamma_2$. Estimated using density with default bandwidths and the specified value for bwadjy. Not shown if graphy=F. The label for the x-axis can be varied with the yname option. The three estimated density functions are returned after estimation as yhat11, yhat22, and yhat12.

4. A graph showing the change in densities, d2211 = f22 - f11, along with the Machado-Mata decomposition showing:

(a) the change in densities due to the variables listed in formx: d2212 = f22 - f12.

(b) the change in densities due to the coefficients: d1211 = f12 - f11.

These estimates are returned after estimation as d2211, dd2212, and d1211. The density changes are not shown if graphdy=F. The label for the x-axis can be varied with the yname option. The bandwidth for the original density functions f11, f22, and f12 can be varied with bwadjdy. It is generally desirable to set bwadjdy > bwadjy because additional smoothing is needed to make the change in densities appear smooth.

The distributions are simulated by drawing nsim samples with replacement from xobs1 <- seq(1:n1), xobs2 <- seq(1:n2), and bobs <- seq(1:length(taumat)). The commands for the simulations are:

xobs1 <- sample(seq(1:n1),nsim,replace=TRUE)

xobs2 <- sample(seq(1:n2),nsim,replace=TRUE)

bobs <- sample(seq(1:ntau),nsim,replace=TRUE)

xhat1 <- allmat1[xobs1,]

xhat2 <- allmat2[xobs2,]

znames <- setdiff(colnames(allmat1),colnames(xmat1))

if (identical(znames,"(Intercept)")) xhat12 <- xhat1

if (!identical(znames,"(Intercept)"))

xhat12 <- cbind(xhat2[,znames],xhat1[,colnames(xmat1)])

xhat12 <- xhat12[,colnames(allmat1)]

bhat1 <- bmat1[bobs,]

bhat2 <- bmat2[bobs,]

where allmat and xmat denote the matrices defined by explanatory variables listed in formall (including the intercet) and formx. Since the bandwidths are simply the defaults from the density function, they are likely to be different across regimes as the number of observations and the standard deviations may vary across times. Thus, the densities are re-estimated using the average across regimes of the original bandwidths.

References

Koenker, Roger. Quantile Regression. New York: Cambridge University Press, 2005.

Machado, J.A.F. and Mata, J., "Counterfactual Decomposition of Changes in Wage Distributions using Quantile Regression," Journal of Applied Econometrics 20 (2005), 445-465.

McMillen, Daniel P., "Changes in the Distribution of House Prices over Time: Structural Characteristics, Neighborhood or Coefficients?" Journal of Urban Economics 64 (2008), 573-589.

Examples

Run this code

par(ask=TRUE)

n = 5000
set.seed(484913)
x1 <- rnorm(n,0,1)
u1 <- rnorm(n,0,.5)
y1 <- x1 + u1

# no change in x.  Coefficients show quantile effects
tau <- runif(n,0,.5)
x2 <- x1
y2 <- (1 + (tau-.5))*x2 + .5*qnorm(tau)

dat <- data.frame(rbind(cbind(y1,x1,1), cbind(y2,x2,2)))
names(dat) <- c("y","x","year")
bmat1 <- qregbmat(y~x,data=dat[dat$year==1,],graphb=FALSE)
bmat2 <- qregbmat(y~x,data=dat[dat$year==2,],graphb=FALSE)
fit1 <- qregsim2(y~x,~x,dat[dat$year==1,],dat[dat$year==2,],
  bmat1,bmat2,bwadjdy=2)

# Distribution of x changes.  Coefficients and u stay the same
x2 <- rnorm(n,0,2)
y2 <- x2 + u1
dat <- data.frame(rbind(cbind(y1,x1,1), cbind(y2,x2,2)))
names(dat) <- c("y","x","year")
bmat1 <- qregbmat(y~x,data=dat[dat$year==1,],graphb=FALSE)
bmat2 <- qregbmat(y~x,data=dat[dat$year==2,],graphb=FALSE)
fit1 <- qregsim2(y~x,~x,dat[dat$year==1,],dat[dat$year==2,],
  bmat1,bmat2,bwadjdy=2)

Run the code above in your browser using DataLab