
Last chance! 50% off unlimited learning
Sale ends in
density
computes kernel density
estimates. Its default method does so with the given kernel and
bandwidth for univariate observations.
density(x, ...)
"density"(x, bw = "nrd0", adjust = 1, kernel = c("gaussian", "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine"), weights = NULL, window = kernel, width, give.Rkern = FALSE, n = 512, from, to, cut = 3, na.rm = FALSE, ...)
bw
can also be a character string giving a rule to choose the
bandwidth. See bw.nrd
. The default,
"nrd0"
, has remained the default for historical and
compatibility reasons, rather than as a general recommendation,
where e.g., "SJ"
would rather fit, see also Venables and
Ripley (2002).
The specified (or computed) value of bw
is multiplied by
adjust
.
adjust*bw
.
This makes it easy to specify values like ‘half the default’
bandwidth."gaussian"
,
"rectangular"
, "triangular"
, "epanechnikov"
,
"biweight"
, "cosine"
or "optcosine"
, with default
"gaussian"
, and may be abbreviated to a unique prefix (single
letter). "cosine"
is smoother than "optcosine"
, which is the
usual ‘cosine’ kernel in the literature and almost MSE-efficient.
However, "cosine"
is the version used by S.
x
. The default NULL
is
equivalent to weights = rep(1/nx, nx)
where nx
is the
length of (the finite entries of) x[]
.bw
is not, will set bw
to width
if this is a
character string, or to a kernel-dependent multiple of width
if this is numeric.kernel
is returned
instead.cut * bw
outside
of range(x)
.from
and to
are
cut
bandwidths beyond the extremes of the data. This allows
the estimated density to drop to approximately zero at the extremes.TRUE
, missing values are removed
from x
. If FALSE
any missing values cause an error.give.Rkern
is true, the number $R(K)$, otherwise
an object with class "density"
whose
underlying structure is a list containing the following components.
n
coordinates of the points where the density is
estimated.x
argument.FALSE
).print
method reports summary
values on the
x
and y
components.
density.default
disperses the mass of the
empirical distribution function over a regular grid of at least 512
points and then uses the fast Fourier transform to convolve this
approximation with a discretized version of the kernel and then uses
linear approximation to evaluate the density at the specified points. The statistical properties of a kernel are determined by
$sig^2 (K) = int(t^2 K(t) dt)$
which is always $= 1$ for our kernels (and hence the bandwidth
bw
is the standard deviation of the kernel) and
$R(K) = int(K^2(t) dt)$.
MSE-equivalent bandwidths (for different kernels) are proportional to
$sig(K) R(K)$ which is scale invariant and for our
kernels equal to $R(K)$. This value is returned when
give.Rkern = TRUE
. See the examples for using exact equivalent
bandwidths.
Infinite values in x
are assumed to correspond to a point mass at
+/-Inf
and the density estimate is of the sub-density on
(-Inf, +Inf)
.
Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683--690.
Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.
bw.nrd
,
plot.density
, hist
.
require(graphics)
plot(density(c(-20, rep(0,98), 20)), xlim = c(-4, 4)) # IQR = 0
# The Old Faithful geyser data
d <- density(faithful$eruptions, bw = "sj")
d
plot(d)
plot(d, type = "n")
polygon(d, col = "wheat")
## Missing values:
x <- xx <- faithful$eruptions
x[i.out <- sample(length(x), 10)] <- NA
doR <- density(x, bw = 0.15, na.rm = TRUE)
lines(doR, col = "blue")
points(xx[i.out], rep(0.01, 10))
## Weighted observations:
fe <- sort(faithful$eruptions) # has quite a few non-unique values
## use 'counts / n' as weights:
dw <- density(unique(fe), weights = table(fe)/length(fe), bw = d$bw)
utils::str(dw) ## smaller n: only 126, but identical estimate:
stopifnot(all.equal(d[1:3], dw[1:3]))
## simulation from a density() fit:
# a kernel density fit is an equally-weighted mixture.
fit <- density(xx)
N <- 1e6
x.new <- rnorm(N, sample(xx, size = N, replace = TRUE), fit$bw)
plot(fit)
lines(density(x.new), col = "blue")
(kernels <- eval(formals(density.default)$kernel))
## show the kernels in the R parametrization
plot (density(0, bw = 1), xlab = "",
main = "R's density() kernels with bw = 1")
for(i in 2:length(kernels))
lines(density(0, bw = 1, kernel = kernels[i]), col = i)
legend(1.5,.4, legend = kernels, col = seq(kernels),
lty = 1, cex = .8, y.intersp = 1)
## show the kernels in the S parametrization
plot(density(0, from = -1.2, to = 1.2, width = 2, kernel = "gaussian"),
type = "l", ylim = c(0, 1), xlab = "",
main = "R's density() kernels with width = 1")
for(i in 2:length(kernels))
lines(density(0, width = 2, kernel = kernels[i]), col = i)
legend(0.6, 1.0, legend = kernels, col = seq(kernels), lty = 1)
##-------- Semi-advanced theoretic from here on -------------
(RKs <- cbind(sapply(kernels,
function(k) density(kernel = k, give.Rkern = TRUE))))
100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies
bw <- bw.SJ(precip) ## sensible automatic choice
plot(density(precip, bw = bw),
main = "same sd bandwidths, 7 different kernels")
for(i in 2:length(kernels))
lines(density(precip, bw = bw, kernel = kernels[i]), col = i)
## Bandwidth Adjustment for "Exactly Equivalent Kernels"
h.f <- sapply(kernels, function(k)density(kernel = k, give.Rkern = TRUE))
(h.f <- (h.f["gaussian"] / h.f)^ .2)
## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..
plot(density(precip, bw = bw),
main = "equivalent bandwidths, 7 different kernels")
for(i in 2:length(kernels))
lines(density(precip, bw = bw, adjust = h.f[i], kernel = kernels[i]),
col = i)
legend(55, 0.035, legend = kernels, col = seq(kernels), lty = 1)
Run the code above in your browser using DataLab