codhump
is used to fit a constrained non-parametric estimation of the cause- and age-specific contributions to the young adult mortality hump.
codhump(data, typ, x.range = NULL, maxit = 200, x.hump = 25,
start.correct = FALSE, lambda = 10^5)
list produced with HCD2MH
or similarly structured
list containing the cause-of-death typology to use (see details)
age range to consider for the analysis
maximum number of iterations of the optimization algorithm
assumed end of the hump used to estimate the starting values (see details)
logical value to automatically correct incoherent starting values
smoothing parameter for the senesence component
Returns an object of class codhump that includes
Data frame containing for each age the overall mortality rates, the cause-deleted rates and the rates of causes that do not contribute to the hump.
Fit of the sse
model on the all-cause mortality.
Fit of the sse
model for each cause-deleted mortality rates before constraining.
Fit of the sse
model for each cause-deleted mortality rates after constraining.
Age- and cause-specific contributions to the hump.
Percents of negative contributions after each iteration.
Maximum relative change among all coefficients of the components of the sse
model after each iteration.
List of parameters provided to fit the model.
The estimation uses simultaneous constrained splines to estimate a Sum of Smooth Exponentials (SSE) model on cause-deleted forces of mortality. Briefly, the SSE model describes the observed force of mortality over age mu as the sum of three vectors mu1, mu2, mu3 over age. In other words, it assumes that deaths are realizations from Poisson distributions with mean composed of three parts: infant, early-adulthood and old-age mortality. For more information on the SSE model, see Camarda et al. (2016) and sse.fit. For the purpose of the study of the young adult mortality hump, the SSE model is here reduced to two components capturing the hump and the senescence parts of the force of mortality.
In order to decompose the hump by cause of death, this model uses a constrained approach on cause-deleted forces of mortality that can be summarized in four steps.
Identify manually those causes of death that contribute to the young adult hump component
Estimate an SSE model on the overall mortality in order to separate the senescent and young adult hump components
Construct cause-deleted datasets by removing separately deaths from each cause that was identified in step 1.
Simultaneously estimate SSE models for each of these cause-deleted datasets, interpreting the diminution of each component as the contribution of this cause to that component, and constraining the sum of all these contributions to be equal to the components estimated in step 2.
The SSE model on which this algorithm is based is more adaptive to specific mortality schedules than parametric models such as the Heligman-Pollard. It is thus designed to converge to meaningful results in the majority of cases. It however sometimes needs some fine tuning in order to reach coherent results. Please pay special attention to the following parameters in order to maximize the chance to get a meaningful result.
The typ
argument defines the typology of causes of death that are assumed to contribute to the young adult mortality hump.
Each element of the list is a vector containing one or several numerical values corresponding to the columns of the mxc
data frame from the data
object.
If an element of typ
contains only one cause, this cause will be considered on its own. If several causes are mentioned in the same element of the typ
object, these causes will be grouped and considered together in the analysis.
The names of the elements of the typ
object are recycled as the names of the causes of death in the typology.
The choice of the causes of death included in the typ
argument has a profound influence on the results, and should therefore be made with caution.
As each case calls for a specific selection of causes of death, there is no general rule as for which causes should be assumed to contribute to the hump. However, the model assumes that the list accounts for all of the hump.
A good way to test this assumption is to plot the force of mortality after removing all of the deaths from the causes included in the list and check that this leftover category does not display a hump.
Failure to include sufficient causes will result in a probable underestimation of the hump, as well as a difficulty for the algorithm to converge since the all-cause hump will not be totally accounted for by the cause-specific contributions to the hump.
On the other hand, specifying too many causes as contributing to the hump may result in difficulties to estimate cause-specific contributions that are not based on sufficient empirical evidence.
Typically, the typ
object will include somewhere between 2 and 6 causes (or groups of causes) of death depending on the context and the number of causes available in the dataset.
The x.range
argument defines the age range on which the analysis takes place. By default, the bottom boundary corresponds to the age with the lowest observed death rate, which is usually situated around 10 years of age.
The top of the range should include most of the adult years, but should also avoid the so-called mortality plateau often observed after age 100. It is defined by default to 90, but the user may want to change these values if necessary.
The maxit
argument defines the maximum number of iterations used for the step 4 of the algorithm. This step uses a Penalized Composite Link Model (PCLM) along with an iterative re-weighted least squares (IRWLS) optimization procedure.
The maxit
argument will therefore determine the maximum number of iterations used in the IRWLS loop, but also the speed of convergence since the re-weighting defines updated solution as $$new = old * (1 - it/maxit) + new * (it/maxit)$$
The maxit
argument is defined at 200 by default, but it can be increased in case of an absence of convergence, even if the algorithm stopped before reaching maxit
number of iterations.
The x.hump
argument is used to compute the starting values of the independant SSE models. More specifically, it is used to determined the apriori age range to be considered for the young adult mortality hump.
In most cases, especially when the hump is obvious, the model converges even with relatively bad starting values. However, in case of a very narrow hump, it is advised to use a small value of x.hump
(20 to 25, but not lower), and in case of a wide hump it may be useful to consider larger values for x.hump
(30 and over).
Inadequate values of x.hump
may result in incoherent starting values for the cause-deleted SSE models and a lack of convergence in the step 4 of the algorithm (see start.correct
).
The lambda
parameter controls the amount of smoothing imposed on the senescence component of the SSE model. Typically, a large value is advisable since this is the part of the force of mortality that the model aims at removing in order to reveal the hump.
However, in some cases, especially when using abridged datasets, it may be useful to consider smaller values of lambda
such as 10 or 100. A bad choice of lambda
may result in poor starting values for the SSE and a lack of convergence in the step 4 of the algorithm.
The start.correct
argument is conceived as a safeguard against misspecified starting values of the SSE model. Specifically, while estimating the SSE model on the cause-deleted forces of mortality, if the hump component peaks after the middle of the x.range
interval, the starting values are replaced with the all-cause components.
This parameter is designed as an attempt to salvage a bad choice in other arguments, especially typ
, maxit
, x.hump
and lambda
, but remains an emergency safeguard. In case of a lack of convergence, it is advised to change the values of the other parameters instead on relying on the start.correct
argument to guarantee convergence.
Camarda, C. G., Eilers, P. H. C., & Gampe, J. (2016). Sums of smooth exponentials to decompose complex series of counts. Statistical Modelling.
Remund, A., Riffe, T., & Camarda, C. (2018). A cause-of-death decomposition of young adult excess mortality. Demography.
# NOT RUN {
data("USA2000m")
typ <- list()
typ$tac <- 89
typ$sui <- 96
typ$hom <- 97
typ$poi <- c(93,94)
typ$oac <- c(95,98,100)
fit <- codhump(data = USA2000m, typ = typ, start.correct = TRUE)
# }
Run the code above in your browser using DataLab