DISTPLOTS: Empirical distribution plots

Description

Sample values are plotted against their empirical distribution in graphs where points belonging to a particular distribution should lie on a straight line.

Usage

plotpos (x, a=0, orient="xF", ...)
 plotposRP (x, a=0, orient="xF", ...)
 unifplot (x, a=0, orient="xF", line=TRUE, ...)
 normplot (x, a=0, orient="xF", line=TRUE, ...)
 lognormplot (x, a=0, orient="xF", line=TRUE, ...)
 gumbelplot (x, a=0, orient="xF", line=TRUE, ...)
 pointspos (x, a=0, orient="xF", ...)
 pointsposRP (x, a=0, orient="xF", ...)
 unifpoints (x, a=0, orient="xF", ...)
 normpoints (x, a=0, orient="xF", ...)
 gumbelpoints (x, a=0, orient="xF", ...)
 regionalplotpos (x, cod, a=0, orient="xF", ...)
 regionalnormplot (x, cod, a=0, orient="xF", ...)
 regionallognormplot (x, cod, a=0, orient="xF", ...)
 regionalgumbelplot (x, cod, a=0, orient="xF", ...)

Arguments

vector representing a data-sample

plotting position parameter, normally between 0 and 0.5 (the default value here, corresponding to the Hazen plotting position, see details)

orient

if orient="xF" the abscissa will be x and the ordinate F

line

if TRUE (default) a straight line indicating the normal, lognormal or Gumbel distribution with parameters estimated from x is plotted

cod

array that defines the data subdivision among sites

...

graphical parameters as xlab, ylab, main, ...

Value

Representation of the values of x vs their empirical probability function $F$ in a cartesian, uniform, normal, lognormal or Gumbel plot. plotpos and unifplot are analogous except for the axis notation, unifplot has the same notation as normplot, lognormplot, ... plotposRP is analogous to plotpos but the frequencies $F$ are expressed as Return Periods $T=1/(1-F)$. With the default settings, $F$ is defined with the Weibull plotting position $F=k/(n+1)$. The straight line (if line=TRUE) indicate the uniform, normal, lognormal or Gumbel distribution with parameters estimated from x. The regional plots draw samples of a region on the same plot.
pointspos, normpoints, ... are the analogous of points, they can be used to add points or lines to plotpos, normplot, ... normpoints can be used either in normplot or lognormplot.

Details

A brief introduction on Probability Plots (or Quantile-Quantile plots) is available on http://en.wikipedia.org/wiki/Q-Q_plot. For plotting positions see http://en.wikipedia.org/wiki/Plotting_position.

For the quantiles of the comparison distribution typically the Weibull formula $k/(n + 1)$ is used (default here). Several different formulas have been used or proposed as symmetrical plotting positions. Such formulas have the form $$(k - a)/(n + 1 - 2a)$$ for some value of $a$ in the range from 0 to 1/2. The above expression $k/(n+1)$ is one example of these, for $a=0$. The Filliben plotting position has $a = 0.3175$ and the Cunanne plotting position has $a = 0.4$ should be nearly quantile-unbiased for a range of distributions. The Hazen plotting position, widely used by engineers, has $a = 0.5$. The Blom's plotting position, $a = 3/8$, gives nearly unbiased quantiles for the normal distribution, while the Gringeton plotting position, $a = 0.44$, is optimized for the largest observations from a Gumbel distribution. For the generalized Pareto, the GEV and related distributions of the Type I (Gumbel) and Weibull, $a = 0.35$ is suggested.

For large sample size, $n$, there is little difference between these various expressions.

Examples

Run this code

x <- rnorm(30,10,2)
plotpos(x)
normplot(x)
normplot(x,xlab=expression(D[m]),ylab=expression(hat(F)),
         main="Normal plot",cex.main=1,font.main=1)
normplot(x,line=FALSE)

x <- rlnorm(30,log(100),log(10))
normplot(x)
lognormplot(x)

x <- rand.gumb(30,1000,100)
normplot(x)
gumbelplot(x)

x <- rnorm(30,10,2)
y <- rnorm(50,10,3)
z <- c(x,y)
codz <- c(rep(1,30),rep(2,50))
regionalplotpos(z,codz)
regionalnormplot(z,codz,xlab="z")
regionallognormplot(z,codz)
regionalgumbelplot(z,codz)

plotpos(x)
pointspos(y,pch=2,col=2)

x <- rnorm(50,10,2)
F <- seq(0.01,0.99,by=0.01)
qq <- qnorm(F,10,2)
plotpos(x)
pointspos(qq,type="l")

normplot(x,line=FALSE)
normpoints(x,type="l",lty=2,col=3)

lognormplot(x)
normpoints(x,type="l",lty=2,col=3)

gumbelplot(x)
gumbelpoints(x,type="l",lty=2,col=3)

# distributions comparison in probabilistic graphs
x <- rnorm(50,10,2)
F <- seq(0.001,0.999,by=0.001)
loglikelhood <- function(param) {-sum(dgamma(x, shape=param[1], 
                scale=param[2], log=TRUE))}
parameters <- optim(c(1,1),loglikelhood)$par
qq <- qgamma(F,shape=parameters[1],scale=parameters[2])
plotpos(x)
pointspos(qq,type="l")

normplot(x,line=FALSE)
normpoints(qq,type="l")

lognormplot(x,line=FALSE)
normpoints(qq,type="l")

Run the code above in your browser using DataLab