Kernel Density Estimation
The (S3) generic function
density computes kernel density
estimates. Its default method does so with the given kernel and
bandwidth for univariate observations.
density(x, ...) "density"(x, bw = "nrd0", adjust = 1, kernel = c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine"), weights = NULL, window = kernel, width, give.Rkern = FALSE, n = 512, from, to, cut = 3, na.rm = FALSE, ...)
- the data from which the estimate is to be computed.
- the smoothing bandwidth to be used. The kernels are scaled
such that this is the standard deviation of the smoothing kernel.
(Note this differs from the reference books cited below, and from S-PLUS.)
bwcan also be a character string giving a rule to choose the bandwidth. See
bw.nrd. The default,
"nrd0", has remained the default for historical and compatibility reasons, rather than as a general recommendation, where e.g.,
"SJ"would rather fit, see also Venables and Ripley (2002).
The specified (or computed) value of
bwis multiplied by
- the bandwidth used is actually
adjust*bw. This makes it easy to specify values like half the default bandwidth.
- kernel, window
- a character string giving the smoothing kernel
to be used. This must be one of
"optcosine", with default
"gaussian", and may be abbreviated to a unique prefix (single letter).
"cosine"is smoother than
"optcosine", which is the usual cosine kernel in the literature and almost MSE-efficient. However,
"cosine"is the version used by S.
- numeric vector of non-negative observation weights,
hence of same length as
x. The default
NULLis equivalent to
weights = rep(1/nx, nx)where
nxis the length of (the finite entries of)
- this exists for compatibility with S; if given, and
bwis not, will set
widthif this is a character string, or to a kernel-dependent multiple of
widthif this is numeric.
- logical; if true, no density is estimated, and
the canonical bandwidth of the chosen
kernelis returned instead.
- the number of equally spaced points at which the density is
to be estimated. When
n > 512, it is rounded up to a power of 2 during the calculations (as
fftis used) and the final result is interpolated by
approx. So it almost always makes sense to specify
nas a power of two.
- the left and right-most points of the grid at which the
density is to be estimated; the defaults are
cut * bwoutside of
- by default, the values of
cutbandwidths beyond the extremes of the data. This allows the estimated density to drop to approximately zero at the extremes.
- logical; if
TRUE, missing values are removed from
FALSEany missing values cause an error.
- further arguments for (non-default) methods.
The algorithm used in
density.default disperses the mass of the
empirical distribution function over a regular grid of at least 512
points and then uses the fast Fourier transform to convolve this
approximation with a discretized version of the kernel and then uses
linear approximation to evaluate the density at the specified points.
The statistical properties of a kernel are determined by
$sig^2 (K) = int(t^2 K(t) dt)$
which is always $= 1$ for our kernels (and hence the bandwidth
bw is the standard deviation of the kernel) and
$R(K) = int(K^2(t) dt)$.
MSE-equivalent bandwidths (for different kernels) are proportional to
$sig(K) R(K)$ which is scale invariant and for our
kernels equal to $R(K)$. This value is returned when
give.Rkern = TRUE. See the examples for using exact equivalent
Infinite values in
x are assumed to correspond to a point mass at
+/-Inf and the density estimate is of the sub-density on
ncoordinates of the points where the density is estimated.
- the estimated density values. These will be non-negative, but can be zero.
- the bandwidth used.
- the sample size after elimination of missing values.
- the call which produced the result.
- the deparsed name of the
- logical, for compatibility (always
give.Rkernis true, the number $R(K)$, otherwise an object with class
"density"whose underlying structure is a list containing the following components.
summaryvalues on the
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole (for S version).
Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683--690.
Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.
require(graphics) plot(density(c(-20, rep(0,98), 20)), xlim = c(-4, 4)) # IQR = 0 # The Old Faithful geyser data d <- density(faithful$eruptions, bw = "sj") d plot(d) plot(d, type = "n") polygon(d, col = "wheat") ## Missing values: x <- xx <- faithful$eruptions x[i.out <- sample(length(x), 10)] <- NA doR <- density(x, bw = 0.15, na.rm = TRUE) lines(doR, col = "blue") points(xx[i.out], rep(0.01, 10)) ## Weighted observations: fe <- sort(faithful$eruptions) # has quite a few non-unique values ## use 'counts / n' as weights: dw <- density(unique(fe), weights = table(fe)/length(fe), bw = d$bw) utils::str(dw) ## smaller n: only 126, but identical estimate: stopifnot(all.equal(d[1:3], dw[1:3])) ## simulation from a density() fit: # a kernel density fit is an equally-weighted mixture. fit <- density(xx) N <- 1e6 x.new <- rnorm(N, sample(xx, size = N, replace = TRUE), fit$bw) plot(fit) lines(density(x.new), col = "blue") (kernels <- eval(formals(density.default)$kernel)) ## show the kernels in the R parametrization plot (density(0, bw = 1), xlab = "", main = "R's density() kernels with bw = 1") for(i in 2:length(kernels)) lines(density(0, bw = 1, kernel = kernels[i]), col = i) legend(1.5,.4, legend = kernels, col = seq(kernels), lty = 1, cex = .8, y.intersp = 1) ## show the kernels in the S parametrization plot(density(0, from = -1.2, to = 1.2, width = 2, kernel = "gaussian"), type = "l", ylim = c(0, 1), xlab = "", main = "R's density() kernels with width = 1") for(i in 2:length(kernels)) lines(density(0, width = 2, kernel = kernels[i]), col = i) legend(0.6, 1.0, legend = kernels, col = seq(kernels), lty = 1) ##-------- Semi-advanced theoretic from here on ------------- (RKs <- cbind(sapply(kernels, function(k) density(kernel = k, give.Rkern = TRUE)))) 100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies bw <- bw.SJ(precip) ## sensible automatic choice plot(density(precip, bw = bw), main = "same sd bandwidths, 7 different kernels") for(i in 2:length(kernels)) lines(density(precip, bw = bw, kernel = kernels[i]), col = i) ## Bandwidth Adjustment for "Exactly Equivalent Kernels" h.f <- sapply(kernels, function(k)density(kernel = k, give.Rkern = TRUE)) (h.f <- (h.f["gaussian"] / h.f)^ .2) ## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible.. plot(density(precip, bw = bw), main = "equivalent bandwidths, 7 different kernels") for(i in 2:length(kernels)) lines(density(precip, bw = bw, adjust = h.f[i], kernel = kernels[i]), col = i) legend(55, 0.035, legend = kernels, col = seq(kernels), lty = 1)