Learn R Programming

gamlss (version 4.2-4)

histSmo: Density estimation using the Poisson trick

Description

This set of functions use the old Poisson trick of discretising the data and then fitting a Poisson error model to the resulting frequencies (Lindsey, 1997). Here the model fitted is a smooth cubic spline curve. The result is a density estimator for the data.

Usage

histSmo(y, lambda = NULL, df = NULL, order = 3, lower = NULL,  
       upper = NULL, type = c("freq", "prob"), 
       save = FALSE, plot = TRUE, breaks = NULL,  
       discrete = FALSE, ...)
histSmoC(y, df = 10, lower = NULL, upper = NULL, type = c("freq", "prob"), 
       save = FALSE, plot = TRUE, breaks = NULL,  
       discrete = FALSE, ...)
histSmoO(y, lambda = 1, order = 3, lower = NULL, upper = NULL, 
      type = c("freq", "prob"), save = FALSE, 
      plot = TRUE, breaks = NULL,  
      discrete = FALSE, ...)
histSmoP(y, lambda = NULL, df = NULL, order = 3, lower = NULL, 
      upper = NULL, type = c("freq", "prob"), save = FALSE, 
      plot = TRUE, breaks = NULL,  discrete = FALSE, 
      ...)

Arguments

y
the variable of interest
lambda
the smoothing parameter
df
the degrees of freedom
order
the order of the P-spline
lower
the lower limit of the y-variable
upper
the upper limit of the y-variable
type
the type of histogram
save
whether to save the results
plot
whether to plot the resulting density estimator
breaks
the number of break points to be used in the histogram and consequently the number of observations in the Poisson fit
discrete
whether to treat the fitting density as a discrete distribution or not
...
further arguments passed to or from other methods.

Value

  • Returns a histSmo S3 object. The object has the following components:
  • xthe middle points of the discretise data
  • countshow many observation are on the discretise intervals
  • densitythe density value for each discrete interval
  • histthe hist object used to discretise the data
  • cdfThe resulting cumulative distribution function useful for calculating probabilities from the estimate density
  • nvcdfThe inverse cumulative distribution function
  • modelThe fitted Poisson model only for histSmoP() and histSmoC()

Details

Here are the methods used here:

i) The function histSmoO() uses Penalised discrete splines (Eilers, 2003). This function is appropriate when the smoothing parameter is fixed.

ii) The function histSmoC() uses smooth cubic splines and fits a Poison error model to the frequencies using the cs() additive function of GAMLSS. This functon is appropriate if the effective degrees of freedom are fixed in the model.

iii) The function histSmoP() uses Penalised cubic splines (Eilers and Marx 1996). It is fitting a Poisson model to the frequencies using the pb() additive function of GAMLSS. This function is appropriate if automatic selection of the smoothing parameter is required.

iv) The function histSmo() combines all the above functions in the sense that if lambda is fixed it uses histSmoO(), if the df's are fixed it uses code{histSmoC()} and if none of these is specified it uses histSmoP().

References

Eilers, P. (2003). A perfect smoother. Analytical Chemistry, 75: 3631-3636.

Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statist. Sci, 11, 89-121.

Lindsey, J.K. (1997) Applying Generalized Linear Models. New York: Springer-Verlag. ISBN 0-387-98218-3

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, http://www.jstatsoft.org/v23/i07.

See Also

pb, cs

Examples

Run this code
# creating data from Pareto 2 distribution
Y <- rPARETO2(1000) 
# getting the density 
 histSmo(Y, lower=0)
# more  breaks a bit slower
 histSmo(Y, breaks=200, lower=0)
# quick fit using lambda
histSmoO(Y, lambda=1, breaks=200, lower=0)
# or 
histSmo(Y, lambda=1, breaks=200, lower=0)
# quick fit using df
histSmoC(Y, df=15, breaks=200, lower=0)
# or 
histSmo(Y, df=15, breaks=200, lower=0)
# saving results
m1<- histSmo(Y, lower=0, save=TRUE)
plot(m1)
plot(m1, "cdf")
plot(m1, "invcdf")
# now gererate from SHASH
YY <- rSHASH(1000)
m1<- histSmo(YY, save=TRUE)
# calculate Pr(YY>10)
1-m1$cdf(10)
# calculate Pr(-10<YY<10)
1-(1-m1$cdf(10))-m1$cdf(-10)
YYY <- rNBI(1000, mu=5, sigma=4)
histDist(YYY, discrete=TRUE, family=NBI())

Run the code above in your browser using DataLab