Learn R Programming

McSpatial (version 2.0)

condens: Conditional density estimation

Description

Estimates conditional density functions of the form f(y| x) = f(x,y)/f(x). Kernel density estimators are used to estimate f(x,y) and f(x). The conditional density function can be plotted as a three-dimensional surface or as a contour map. Alternatively, the conditional density of y can be graphed for as many as five target values of x.

Usage

condens(form,window=.7,bandwidth=0,kern="tcub", mingrid.x=NULL,maxgrid.x=NULL,mingrid.y=NULL,maxgrid.y=NULL,ngrid=50, xlab="x",ylab="y",zlab="fxy/fx",contour=TRUE,level=TRUE,wire=TRUE,dens=TRUE, targetx.dens=NULL,quantile.dens=c(.10,.25,.50,.75,.90),data=NULL)

Arguments

form
Model formula
window
Window size. Default: 0.25.
bandwidth
Bandwidth. Default: not used.
kern
Kernel weighting functions. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss".
mingrid.x, maxgrid.x, mingrid.y, maxgrid.y, ngrid
The mingrid and maxgrid values are the boundaries for the ngrid x ngrid lattice used in the graphs produced by condens. By default, mingrid.x = min(x), maxgrid.x = max(x), mingrid.y = min(y), maxgrid.y = max(y), and ngrid=50.
xlab
Label for the x-axis in graphs. Default: "x"
ylab
Label for the y-axis in graphs. Default: "y"
zlab
Label for the z-axis in graphs. Default: "fxy/fx"
contour
If contour=T, produces a two-dimensional contour plot of the conditional density estimates. Evaluated for an ngrid x ngrid lattice. Default is contour=T.
level
If level=T, produces a two-dimensional level plot of the conditional density estimates. Evaluated for an ngrid x ngrid lattice. Default is level=F.
wire
If wire=T, produces a three-dimensional plot of the conditional density estimates. Evaluated for an ngrid x ngrid lattice. Default is wire=T.
dens
If dens=T, produces a plot showing how f(y|x) varies over y for given target values of x. Target values of x are provided using the targetx.dens or quantile.dens options. Default is dens=F.
targetx.dens
Target values for x in the density plots, e.g, targetx.dens = c(200,400,600). Maximum number of entries is 5. If targetx.dens has more than 5 entries, only the first 5 will be used. Default is targetx.dens = NULL, meaning that the target values for x are determined by the quantile.dens option.
quantile.dens
Quantiles for the target values for x in the density plots, e.g, quantile.dens = c(.25,.50,.75). Maximum number of entries is 5. If quantile.dens has more than 5 entries, only the first 5 will be used. Default is quantile.dens = c(.10,.25,.50,.75,.90).
data
A data frame containing the data. Default: use data in the current working directory.

Value

fx
The values of f(x), one for each data point.
fy
The values of f(y), one for each data point.
fxy
The values of f(x,y), one for each data point. The conditional densities are fxy/fx for x and fxy/fy for y.
gridmat
An (ngrid*ngrid)x3 matrix used to produce the contour, level, and wire maps. The first column contains the lattice values for x, the second column contains the lattice values for y, and the third column has the estimated values of f(y|x) at the target values for x and y.
densmat
The estimated values of f(y|x) for the two-dimensional density graphs produced when dens = TRUE. If the number of observations in the call to condens is n and the number of entries in quantile.dens is nq, then densmat is an n x nq matrix.

Details

The locfit package is used to find the target values of x for f(x) and y for f(y). The expand.grid command is then used to determine the target values of x and y for f(x,y). The smooth12 command is used to interpolate f(x), f(y), and f(x,y) to the full data set and to the grid of target values for the contour, level, and wire plots.

The density functions f(x) and f(y) are as follows:

$$f(x) = \frac{1}{sd(x)*b*n} \sum_i K ( \frac{ x_i - x}{sd(x)*b} )$$ $$f(y) = \frac{1}{sd(y)*b*n} \sum_i K ( \frac{ y_i - y}{sd(y)*b} )$$

A product kernel is used for f(x,y):

$$f(x,y) = \frac{1}{sd(x)*b*sd(y)*b*n}\sum_i K ( \frac{ x_i - x}{sd(x)*b} ) K ( \frac{ y_i - y}{sd(y)*b} ) $$

where b is the bandwidth and the target points are x and y. The bandwidth, b, can be set using the bandwidth option. If b = 0 (the default), sd(x)*b and sd(y)*b are replaced by window values, $h = quantile(dist, window)$, where $dist = |x_i - x|$ or $dist = |y_i - y|$. The window size is set using the window option. By default, window = .7 and bandwidth = 0. Available kernel weighting functions include the following:

Kernel Call abbreviation
Kernel function K(z) Rectangular
``rect'' $1/2 * I(|z|<1)$ <="" td="">
Triangular ``tria''
$(1-|z|) * I(|z|<1)$< td=""> Epanechnikov
``epan'' $3/4 * (1-z^2)*I(|z| < 1)$
Bi-Square ``bisq''
$15/16 * (1-z^2)^2 * I(|z| < 1)$ Tri-Cube
``tcub'' $70/81 * (1-|z|^3)^3 * I(|z| < 1)$
Tri-Weight ``trwt''
$35/32 * (1-z^2)^3 * I(|z| < 1)$ Gaussian
``gauss'' $2pi^{-.5} exp(-z^2/2)$

The contour, level, and wire plots are produced from the values in gridmat using the lattice package. The two-dimensional density graphs produced when dens=TRUE are plots of f(y,x)/f(x) at given values of x. By default, the values for x are the quantiles given in quantile.dens. Alternatively, the values of x can be specified directly using the targetx.dens option. The values used to construct the density graphs are stored in densmat. Both gridmat and densmat are stored by condens even if the printing of the graphs is suppressed.

References

Li, Oi and Jeffrey Scott Racine. Nonparametric Econometrics: Theory and Practice. Princeton, NJ: Princeton University Press, 2007.

Loader, Clive. Local Regression and Likelihood. New York: Springer, 1999.

Pagan, Adrian and Aman Ullah. Nonparametric Econometrics. New York: Cambridge University Press, 1999.

See Also

qregcdf

Examples

Run this code
data(dupage99)
dupage99$ratio <- dupage99$av/dupage99$price
dupage99$price <- dupage99$price/1000
par(ask=TRUE)
fit <- condens(ratio~price,contour=TRUE,level=TRUE,wire=TRUE,dens=TRUE, 
  targetx.dens=seq(100,500,100), data=dupage99)

Run the code above in your browser using DataLab