loglogSurv: Survival function plots for checking the tail behaviour of the data

Description

The log-log Survival functions are design for checking the tails of a single response variable (no explanatory should be involved). There are three different function:

a) the function loglogSurv1() which plot the right tails of the empirical log-log Survival function against loglog(y), where y is the variable of interest. The coefficient of a linear fit to the plot can be used as an estimated for Type I tails (see Chapter 17 in Rigby et al. (2019) for definition of the different types of tails.)

b) the function loglogSurv2() which plot the right tails of the empirical log-log Survival function against log(y). The coefficient of a linear fit to the plot can be used as an estimated for Type II tails.

c) the function loglogSurv3() which plot the (left or right) tails of the empirical log-log Survival function against y. The coefficient of a linear fit to the plot can be used as an estimated for Type III tails.

The function loglogSurv() combines all the above functions.

The function logSurv() is design for exploring the heavy tails of a single response variable. It plots the empirical log-survival function of the right tail of the distribution or the empirical log-cdf function of the left tail against log(y) for a specified probability of the tail. Then fits a linear, a quadratic and an exponential curve to the points of the plot. For distributions defined on the positive real line a good linear fit would indicate a Pareto type tail, a good quadratic fit a log-normal type tail and good exponential fit a Weibull type tail. Note that this function is only appropriate to investigate rather heavy tails and it is not very good to discriminate between different type of tails, as the loglogSurv(). The function logSurv0() plots but do not fit the curves.

The function loglogplot() plot the empirical log-survival function of all data against log(y). The function ECDF() calculates the empirical commutative distribution function. It is similar to ecdf() but divides by n+1 rather n, the number of conservations.

Usage

loglogSurv(y, prob = 0.9, print = TRUE, title = NULL, lcol = gray(0.1), 
           ltype = 1, plot = TRUE, ...)
loglogSurv1(y, prob = 0.9, print = TRUE, title = NULL, lcol = gray(0.1), 
           ltype = 1, ...)
loglogSurv2(y, prob = 0.9, print = TRUE, title = NULL, lcol = gray(0.1), 
           ltype = 1, ...)
           
loglogSurv3(y, prob = 0.9, print = TRUE, title = NULL, lcol = gray(0.1), 
           ltype = 1, ...)
          
logSurv(y, prob = 0.9, tail = c("right", "left"), plot = TRUE, 
       lines = TRUE, print = TRUE, title = NULL, lcol = c(gray(0.1), 
       gray(0.2), gray(0.3)), ltype = c(1, 2, 3), ...)  
logSurv0(y, prob = 0.9, tail = c("right", "left"), plot = TRUE, 
           title = NULL, ...)
           
         
ECDF(y, weights=NULL)
loglogplot(y, nplus1 = TRUE, ...)

Arguments

a vector, the variable of interest

prob

what probability. The defaul is 0.90 which means 10% for "right" tail 90% for "left" tail

tail

which tall needs checking the right (default) of the left

plot

whether to plot with default equal TRUE

whether to print the coefficients with default equal TRUE

title

if a different title rather the default is needed

lcol

The line colour in the plot

lines

whether to plot the fitted lines

ltype

The line type in the plot

nplus1

whether to divide by n+1 or n when calculating the ecdf

weights

prior weights for ECDF()

...

for extra argument in the plot command

Value

The functions create plots.

Details

The functions loglogSurv1(), loglogSurv3() and loglogSurv3() take the upper part of an ordered variable, create its empirical survival function, and plot the log-log of this functions against log(log(y)), log(y) and y, respectively. Then they fit a line to the plot. The coefficients of the line can be interpreted as parameters determined the behaviour of the tail. The function loglogSurv() fits all three models and displays the best.

The function logSurv() takes the upper (or lower) part of an ordered variable and plots the log empirical survival function against log(y). Also display three curves i) linear ii) quadratic and iii) exponential to determine what kind of tail relationship exist. Plotting the log empirical survival function against log(y) often call in the literature the "log-log plot".

The function loglogplot() plots the whole log empirical survival function against log(y) (not just the tail). The function ECDF() calculate the step function of the empirical cumulative distribution function.

More details can be found in Chapter 17 of "Rigby et al. (2019) book an old version on which can be found in https://www.gamlss.com/)

References

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby R.A., Stasinopoulos D. M., Heller G., and De Bastiani F., (2019) Distributions for Modelling Location, Scale and Shape: Using GAMLSS in R, Chapman and Hall/CRC. (In press)

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

Examples

Run this code

# NOT RUN {
data(film90)
y <- film90$lborev1
op<-par(mfrow=c(3,1))
loglogSurv1(y)
loglogSurv2(y)
loglogSurv3(y)
par(op)
loglogSurv(y)

logSurv(y)

loglogplot(y)

plot(ECDF(y), main="ECDF")
# }

Run the code above in your browser using DataLab