Mort1Dsmooth: Fit One-dimensional Poisson P-splines

Description

Returns an object of class Mort1Dsmooth which is a P-splines smooth of the input data of degree and order fixed by the user. Specifically tailored to mortality data.

Usage

Mort1Dsmooth(x, y, offset, w, overdispersion=FALSE, ndx = floor(length(x)/5), deg = 3, pord = 2, lambda = NULL, df = NULL, method = 1, coefstart = NULL, control = list())

Arguments

a vector with the values of the predictor variable. These must be at least 2 ndx + 1 of them.

a vector with a set of counts response variable values. y must be a vector of the same length as x.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length either one or equal to the number of cases.

an optional vector of weights to be used in the fitting process. This should be NULL or a numeric vector of length equal to the number of cases.

overdispersion

logical on the accounting for overdisperion in the smoothing parameter selection criterion. See Details. Default: FALSE.

ndx

number of internal knots -1. Default: floor(length(x)/5).

deg

degree of the B-splines. Default: 3.

pord

order of differences. Default: 2.

lambda

smoothing parameter (optional).

a number which specifies the degrees of freedom (optional).

method

the method for controlling the amount of smoothing. method = 1 (default) adjusts the smoothing parameter so that the BIC is minimized. method = 2 adjusts lambda so that the AIC is minimized. method = 3 uses the value supplied for lambda. method = 4 adjusts lambda so that the degrees of freedom is equal to the supplied df.

coefstart

an optional vector of starting coefficients.

control

a list of control parameters. See Details.

Value

coefficients: vector of fitted (penalized) B-splines coefficients.
residuals: the deviance residuals.
fitted.values: vector of fitted counts.
linear.predictor: vector of fitted linear predictor.
logmortality: fitted mortality rates in log-scale.
lev: diagonal of the hat-matrix.
df: effective dimension.
deviance: Poisson Deviance.
aic: Akaike's Information Criterion.
bic: Bayesian Information Criterion.
psi2: overdispersion parameter.
lambda: the selected (given) smoothing parameter lambda.
call: the matched call.
n: number of observations.
tolerance: the used tolerance level.
B: the B-splines basis.
ndx: the number of internal knots -1.
deg: degree of the B-splines.
pord: order of difference.
x: values of the predictor variable.
y: set of counts response variable values.
offset: vector of the offset.
w: vector of weights used in the model.

Details

The method fits a P-spline model with equally-spaced B-splines along x. The response variables must be Poisson distributed counts, though overdisperion can be accounted. Offset can be provided, otherwise the default is that all weights are one.

The function is specifically tailored to smooth mortality data in one-dimensional setting. In such case the argument x would be either the ages or the years under study. Death counts will be the argument y. In a Poisson regression setting applied to actual death counts the offset will be the logarithm of the exposure population. See example below.

The function can obviously account for zero counts and definite offset. In a mortality context, the user can apply the function to data with zero deaths, but it has to take care that no exposures are equal to zero, i.e. offset equal to minus infinitive. In this last case, the argument w can help. The user would need to set weights equal to zero when exposures are equal to zero leading to interpolation of the data. See example below.

Regardless the presence of exposures equal to zero, the argument w can also be used for extrapolation and interpolation of the data. Nevertheless see the function predict.Mort1Dsmooth for a more comprehensive way to forecast mortality rates. The method produces results similar to function smooth.spline, but the smoothing function is a B-spline smooth with discrete penalization directly on the differences of the B-splines coefficients. The user can set the order of difference, the degree of the B-splines and number of them. Nevertheless, the smoothing parameter lambda is mainly used to tune the smoothness/model fidelity of the fitted values.

The range in which lambda is searched is given in control - RANGE. Though it can be modified, the default values are suitable for most of the application. There are print.Mort1Dsmooth, summary.Mort1Dsmooth, plot.Mort1Dsmooth, predict.Mort1Dsmooth and residuals.Mort1Dsmooth methods available for this function.

Four methods for optimizing the smoothing parameter are available. The BIC is set as default. Minimization of the AIC is also possible. BIC will give always smoother outcomes with respect to AIC, especially for large sample size. Alternatively the user can directly provide the smoothing parameter (method=3) or the degree of freedom to be used in the model (method=4). Note that Mort1Dsmooth uses approximated degree of freedom, therefore method=4 will produce fitted values with degree of freedom only similar to the one provided in df. The tolerance level can be set via control - TOL2.

Note that the 'ultimate' smoothing with very large lambda will approach to a polynomial of degree pord.

Starting coeffients for the B-spline basis can be provided by the user. This feature can be useful when a grid-search is manually performed by the user. The argument overdispersion can be set to TRUE when possible presence of over(under)dispersion needs to be considered in the selection of the smoothing parameter. Mortality data often present overdispersion also known, in demography, as heterogeneity. Duplicates in insurance data can lead to overdispersed data, too. Smoothing parameter selection may be affected by this phenomenon. When overdispersion=TRUE, the function uses a penalized quasi-likelihood method for including an overdisperion parameter (psi2) in the fitting procedure. With this approach expected values are assumed equal to the variance multiplied by the parameter psi2. See reference. Note that the inclusion of the overdisperion parameter within the estimation might lead to select higher lambda, leading to smoother outcomes. When overdispersion=FALSE (default value) or method=3 or method=4, psi2 is estimated after the smoothing parameter have been employed. Overdispersion parameter considerably larger (smaller) than 1 may be a sign of overdispersion (underdispersion). The control argument is a list that can supply any of the following components:

MON: Logical. If TRUE tracing information on the progress of the fitting is produced. Default: FALSE.

TOL1: The absolute convergence tolerance for each completed scoring algorithm. Default: 1e-06.

TOL2: Difference between two adjacent smoothing parameters in the (pseudo) grid search, log-scale. Useful only when method is equal to 1, 2 or 4. Default: 0.5. RANGE: Range of smoothing parameters in which the grid-search is applied, commonly taken in log-scale. Default: [10^-4 ; 10^6].

MAX.IT: The maximum number of iterations for each completed scoring algorithm. Default: 50. The arguments MON, TOL1 and MAX.IT are kept during all the grid search when method is equal to 1, 2 or 4. Function cleversearch from package svcm is employed to speed the grid search. See Mort1Dsmooth_optimize for details.

References

Eilers P. H. C. and B. D. Marx (1996). Flexible Smoothing with B-splines and Penalties. Statistical Science. 11, 89-121.

Camarda, C. G. (2012). MortalitySmooth: An R Package for Smoothing Poisson Counts with P-Splines. Journal of Statistical Software. 50, 1-24. http://www.jstatsoft.org/v50/i01/.

Examples

Run this code

## selected data
years <- 1950:2006
death <- selectHMDdata("Japan", "Deaths", "Females",
                       ages = 80, years = years)
exposure <- selectHMDdata("Japan", "Exposures", "Females",
                          ages = 80, years = years)
## various fits
## default using Bayesian Information Criterion
fitBIC <- Mort1Dsmooth(x=years, y=death,
                       offset=log(exposure))
fitBIC
summary(fitBIC)
## subjective choice of the smoothing parameter lambda
fitLAM <- Mort1Dsmooth(x=years, y=death,
                       offset=log(exposure),
                       method=3, lambda=10000)
## plot
plot(years, log(death/exposure),
main="Mortality rates, log-scale.
      Japanese females, age 80, 1950:2006")
lines(years, fitBIC$logmortality, col=2, lwd=2)
lines(years, fitLAM$logmortality, col=3, lwd=2)
legend("topright", c("Actual", "BIC", "lambda=10000"),
       col=1:3, lwd=c(1,2,2), lty=c(-1,1,1),
       pch=c(1,-1,-1))

## see vignettes for examples on
## - Extra-Poisson variation
## - interpolation
## - extrapolation

Run the code above in your browser using DataLab