Density, distribution function, quantile function, and random generation for the empirical distribution based on a set of observations
demp(x, obs, discrete = FALSE, density.arg.list = NULL)
pemp(q, obs, discrete = FALSE,
prob.method = ifelse(discrete, "emp.probs", "plot.pos"),
plot.pos.con = 0.375)
qemp(p, obs, discrete = FALSE,
prob.method = ifelse(discrete, "emp.probs", "plot.pos"),
plot.pos.con = 0.375)
remp(n, obs)
vector of quantiles.
vector of quantiles.
vector of probabilities between 0 and 1.
sample size. If length(n)
is larger than 1, then length(n)
random values are returned.
numeric vector of observations. Missing (NA
), undefined (NaN
), and
infinite (Inf
, -Inf
) values are allowed but will be removed.
logical scalar indicating whether the assumed parent distribution of x
is
discrete (discrete=TRUE
) or continuous (discrete=FALSE
). The
default value is FALSE
.
character string indicating what method to use to compute the empirical
probabilities. Possible values are "emp.probs"
(empirical probabilities,
default if discrete=TRUE
) and "plot.pos"
(plotting positions,
default if discrete=FALSE
). See the DETAILS section for more explanation.
numeric scalar between 0 and 1 containing the value of the plotting position
constant. The default value is plot.pos.con=0.375
. See the DETAILS
section for more information. This argument is ignored if
prob.method="emp.probs"
.
density (demp
), probability (pemp
), quantile (qemp
), or
random sample (remp
) for the empirical distribution based on the data
contained in the vector obs
.
Let obs
), and let
Estimating Density
The function demp
computes the empirical probability density function. If
the observations are assumed to come from a discrete distribution, the probability
density (mass) function is estimated by:
|
|
if |
That is, the estimated probability of observing the value
If the observations are assumed to come from a continuous distribution, the
function demp
calls the R function density
to compute the
estimated density based on the values specified in the argument obs
,
and then uses linear interpolation to estimate the density at the values
specified in the argument x
. See the R help file for
density
for more information on how the empirical density is
computed in the continuous case.
Estimating Probabilities
The function pemp
computes the estimated cumulative distribution function
(cdf), also called the empirical cdf (ecdf). If the observations are assumed to
come from a discrete distribution, the value of the cdf evaluated at the
|
|
if |
(D'Agostino, 1986a). That is, the estimated value of the cdf at the pemp
uses the above equations to compute the empirical cdf when
prob.method="emp.probs"
.
For any general value of
|
|
if |
|
if |
The function pemp
uses the above equation when discrete=TRUE
.
If the observations are assumed to come from a continuous distribution, the value
of the cdf evaluated at the pemp
uses the above equation
when
prob.method="plot.pos"
.
For any general value of
|
|
if |
|
if |
where
pemp
uses the above two equations
when discrete=FALSE
.
Estimating Quantiles
The function qemp
computes the estimated quantiles based on the observed
data. If the observations are assumed to come from a discrete distribution, the
|
|
if |
|
if |
The function qemp
uses the above equation when discrete=TRUE
.
If the observations are assumed to come from a continuous distribution, the
|
|
if |
|
if |
|
|
if |
where
qemp
uses the above two equations when discrete=FALSE
.
Generating Random Numbers From the Empirical Distribution
The function remp
simply calls the R function sample
to
sample the elements of obs
with replacement.
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11--16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7--62.
Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley and Sons, New York.
Sheather, S. J. and Jones M. C. (1991). A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. Journal of the Royal Statististical Society B, 683--690.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
Wegman, E.J. (1972). Nonparametric Probability Density Estimation. Technometrics 14, 533-546.
density
, approx
, epdfPlot
,
ecdfPlot
, cdfCompare
, qqplot
,
eqnpar
, quantile
, sample
,
simulateVector
, simulateMvMatrix
.
# NOT RUN {
# Create a set of 100 observations from a gamma distribution with
# parameters shape=4 and scale=5.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(3)
obs <- rgamma(100, shape=4, scale=5)
# Now plot the empirical distribution (with a histogram) and the true distribution:
dev.new()
hist(obs, col = "cyan", xlim = c(0, 65), freq = FALSE,
ylab = "Relative Frequency")
pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)
box()
# Now plot the empirical distribution (based on demp) with the
# true distribution:
x <- qemp(p = seq(0, 1, len = 100), obs = obs)
y <- demp(x, obs)
dev.new()
plot(x, y, xlim = c(0, 65), type = "n",
xlab = "Value of Random Variable",
ylab = "Relative Frequency")
lines(x, y, lwd = 2, col = "cyan")
pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)
# Alternatively, you can create the above plot with the function
# epdfPlot:
dev.new()
epdfPlot(obs, xlim = c(0, 65), epdf.col = "cyan",
xlab = "Value of Random Variable",
main = "Empirical and Theoretical PDFs")
pdfPlot('gamma', list(shape = 4, scale = 5), add = TRUE)
# Clean Up
#---------
rm(obs, x, y)
# }
Run the code above in your browser using DataLab