cm: Credibility Models

Description

Fit the following credibility models: B<U+00FC>hlmann, B<U+00FC>hlmann-Straub, hierarchical, regression (Hachemeister) or linear Bayes.

Usage

cm(formula, data, ratios, weights, subset,
   regformula = NULL, regdata, adj.intercept = FALSE,
   method = c("Buhlmann-Gisler", "Ohlsson", "iterative"),
   likelihood, ...,
   tol = sqrt(.Machine$double.eps), maxit = 100, echo = FALSE)
# S3 method for cm
print(x, …)
# S3 method for cm
predict(object, levels = NULL, newdata, …)
# S3 method for cm
summary(object, levels = NULL, newdata, …)
# S3 method for summary.cm
print(x, …)

Arguments

formula

character string "bayes" or an object of class "formula": a symbolic description of the model to be fit. The details of model specification are given below.

data

a matrix or a data frame containing the portfolio structure, the ratios or claim amounts and their associated weights, if any.

ratios

expression indicating the columns of data containing the ratios or claim amounts.

weights

expression indicating the columns of data containing the weights associated with ratios.

subset

an optional logical expression indicating a subset of observations to be used in the modeling process. All observations are included by default.

regformula

an object of class "formula": symbolic description of the regression component (see lm for details). No left hand side is needed in the formula; if present it is ignored. If NULL, no regression is done on the data.

regdata

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the regression model.

adj.intercept

if TRUE, the intercept of the regression model is located at the barycenter of the regressor instead of the origin.

method

estimation method for the variance components of the model; see details below.

likelihood

a character string giving the name of the likelihood function in one of the supported linear Bayes cases; see details below.

tol

tolerance level for the stopping criteria for iterative estimation method.

maxit

maximum number of iterations in iterative estimation method.

echo

logical; whether to echo the iterative procedure or not.

x, object

an object of class "cm".

levels

character vector indicating the levels to predict or to include in the summary; if NULL all levels are included.

newdata

data frame containing the variables used to predict credibility regression models.

…

parameters of the prior distribution for cm; additional attributes to attach to the result for the predict and summary methods; further arguments to format for the print.summary method; unused for the print method.

Value

Function cm computes the structure parameters estimators of the model specified in formula. The value returned is an object of class cm.

An object of class "cm" is a list with at least the following components:

means

a list containing, for each level, the vector of linearly sufficient statistics.

weights

a list containing, for each level, the vector of total weights.

unbiased

a vector containing the unbiased variance components estimators, or NULL.

iterative

a vector containing the iterative variance components estimators, or NULL.

cred

for multi-level hierarchical models: a list containing, the vector of credibility factors for each level. For one-level models: an array or vector of credibility factors.

nodes

a list containing, for each level, the vector of the number of nodes in the level.

classification

the columns of data containing the portfolio classification structure.

ordering

a list containing, for each level, the affiliation of a node to the node of the level above.

Regression fits have in addition the following components:

adj.models

a list containing, for each node, the credibility adjusted regression model as obtained with lm.fit or lm.wfit.

transition

if adj.intercept is TRUE, a transition matrix from the basis of the orthogonal design matrix to the basis of the original design matrix.

terms

the terms object used.

The method of predict for objects of class "cm" computes the credibility premiums for the nodes of every level included in argument levels (all by default). Result is a list the same length as levels or the number of levels in formula, or an atomic vector for one-level models.

Hierarchical models

The credibility premium at one level is a convex combination between the linearly sufficient statistic of a node and the credibility premium of the level above. (For the first level, the complement of credibility is given to the collective premium.) The linearly sufficient statistic of a node is the credibility weighted average of the data of the node, except at the last level, where natural weights are used. The credibility factor of node $i$ is equal to $$\frac{w_i}{w_i + a/b},$$ where $w_i$ is the weight of the node used in the linearly sufficient statistic, $a$ is the average within node variance and $b$ is the average between node variance.

Regression models

The credibility premium of node $i$ is equal to $$y^\prime b_i^a,$$ where $y$ is a matrix created from newdata and $b_i^a$ is the vector of credibility adjusted regression coefficients of node $i$. The latter is given by $$b_i^a = Z_i b_i + (I - Z_I) m,$$ where $b_i$ is the vector of regression coefficients based on data of node $i$ only, $m$ is the vector of collective regression coefficients, $Z_i$ is the credibility matrix and $I$ is the identity matrix. The credibility matrix of node $i$ is equal to $$A^{-1} (A + s^2 S_i),$$ where $S_i$ is the unscaled regression covariance matrix of the node, $s^2$ is the average within node variance and $A$ is the within node covariance matrix.

If the intercept is positioned at the barycenter of time, matrices $S_i$ and $A$ (and hence $Z_i$) are diagonal. This amounts to use B<U+00FC>hlmann-Straub models for each regression coefficient.

Argument newdata provides the “future” value of the regressors for prediction purposes. It should be given as specified in predict.lm.

Variance components estimation

For hierarchical models, two sets of estimators of the variance components (other than the within node variance) are available: unbiased estimators and iterative estimators.

Unbiased estimators are based on sums of squares of the form $$B_i = \sum_j w_{ij} (X_{ij} - \bar{X}_i)^2 - (J - 1) a$$ and constants of the form $$c_i = w_i - \sum_j \frac{w_{ij}^2}{w_i},$$ where $X_{ij}$ is the linearly sufficient statistic of level $(ij)$; $\bar{X_{i}}$ is the weighted average of the latter using weights $w_{ij}$; $w_i = \sum_j w_{ij}$; $J$ is the effective number of nodes at level $(ij)$; $a$ is the within variance of this level. Weights $w_{ij}$ are the natural weights at the lowest level, the sum of the natural weights the next level and the sum of the credibility factors for all upper levels.

The B<U+00FC>hlmann-Gisler estimators (method = "Buhlmann-Gisler") are given by $$b = \frac{1}{I} \sum_i \max \left( \frac{B_i}{c_i}, 0 \right),$$ that is the average of the per node variance estimators truncated at 0.

The Ohlsson estimators (method = "Ohlsson") are given by $$b = \frac{\sum_i B_i}{\sum_i c_i},$$ that is the weighted average of the per node variance estimators without any truncation. Note that negative estimates will be truncated to zero for credibility factor calculations.

In the B<U+00FC>hlmann-Straub model, these estimators are equivalent.

Iterative estimators method = "iterative" are pseudo-estimators of the form $$b = \frac{1}{d} \sum_i w_i (X_i - \bar{X})^2,$$ where $X_i$ is the linearly sufficient statistic of one level, $\bar{X}$ is the linearly sufficient statistic of the level above and $d$ is the effective number of nodes at one level minus the effective number of nodes of the level above. The Ohlsson estimators are used as starting values.

For regression models, with the intercept at time origin, only iterative estimators are available. If method is different from "iterative", a warning is issued. With the intercept at the barycenter of time, the choice of estimators is the same as in the B<U+00FC>hlmann-Straub model.

Linear Bayes

When formula is "bayes", the function computes pure Bayesian premiums for the following combinations of distributions where they are linear credibility premiums:

$X|\Theta = \theta \sim \mathrm{Poisson}(\theta)$ and $\Theta \sim \mathrm{Gamma}(\alpha, \lambda)$;
$X|\Theta = \theta \sim \mathrm{Exponential}(\theta)$ and $\Theta \sim \mathrm{Gamma}(\alpha, \lambda)$;
$X|\Theta = \theta \sim \mathrm{Gamma}(\tau, \theta)$ and $\Theta \sim \mathrm{Gamma}(\alpha, \lambda)$;
$X|\Theta = \theta \sim \mathrm{Normal}(\theta, \sigma_2^2)$ and $\Theta \sim \mathrm{Normal}(\mu, \sigma_1^2)$;
$X|\Theta = \theta \sim \mathrm{Bernoulli}(\theta)$ and $\Theta \sim \mathrm{Beta}(a, b)$;
$X|\Theta = \theta \sim \mathrm{Binomial}(\nu, \theta)$ and $\Theta \sim \mathrm{Beta}(a, b)$;
$X|\Theta = \theta \sim \mathrm{Geometric}(\theta)$ and $\Theta \sim \mathrm{Beta}(a, b)$.
$X|\Theta = \theta \sim \mathrm{Negative~Binomial}(r, \theta)$ and $\Theta \sim \mathrm{Beta}(a, b)$.

The following combination is also supported: $X|\Theta = \theta \sim \mathrm{Single~Parameter~Pareto}(\theta)$ and $\Theta \sim \mathrm{Gamma}(\alpha, \lambda)$. In this case, the Bayesian estimator not of the risk premium, but rather of parameter $\theta$ is linear with a “credibility” factor that is not restricted to $(0, 1)$.

Argument likelihood identifies the distribution of $X|\Theta = \theta$ as one of "poisson", "exponential", "gamma", "normal", "bernoulli", "binomial", "geometric", "negative binomial" or "pareto".

The parameters of the distributions of $X|\Theta = \theta$ (when needed) and $\Theta$ are set in … using the argument names (and default values) of dgamma, dnorm, dbeta, dbinom, dnbinom or dpareto1, as appropriate. For the Gamma/Gamma case, use shape.lik for the shape parameter $\tau$ of the Gamma likelihood. For the Normal/Normal case, use sd.lik for the standard error $\sigma_2$ of the Normal likelihood.

Data for the linear Bayes case may be a matrix or data frame as usual; an atomic vector to fit the model to a single contract; missing or NULL to fit the prior model. Arguments ratios, weights and subset are ignored.

Details

cm is the unified front end for credibility models fitting. The function supports hierarchical models with any number of levels (with B<U+00FC>hlmann and B<U+00FC>hlmann-Straub models as special cases) and the regression model of Hachemeister. Usage of cm is similar to lm for these cases. cm can also fit linear Bayes models, in which case usage is much simplified; see the section on linear Bayes below.

When not "bayes", the formula argument symbolically describes the structure of the portfolio in the form $~ terms$. Each term is an interaction between risk factors contributing to the total variance of the portfolio data. Terms are separated by + operators and interactions within each term by :. For a portfolio divided first into sectors, then units and finally contracts, formula would be ~ sector + sector:unit + sector:unit:contract, where sector, unit and contract are column names in data. In general, the formula should be of the form ~ a + a:b + a:b:c + a:b:c:d + ....

If argument regformula is not NULL, the regression model of Hachemeister is fit to the data. The response is usually time. By default, the intercept of the model is located at time origin. If argument adj.intercept is TRUE, the intercept is moved to the (collective) barycenter of time, by orthogonalization of the design matrix. Note that the regression coefficients may be difficult to interpret in this case.

Arguments ratios, weights and subset are used like arguments select, select and subset, respectively, of function subset.

Data does not have to be sorted by level. Nodes with no data (complete lines of NA except for the portfolio structure) are allowed, with the restriction mentioned above.

References

B<U+00FC>hlmann, H. and Gisler, A. (2005), A Course in Credibility Theory and its Applications, Springer.

Belhadj, H., Goulet, V. and Ouellet, T. (2009), On parameter estimation in hierarchical credibility, Astin Bulletin 39.

Goulet, V. (1998), Principles and application of credibility theory, Journal of Actuarial Practice 6, ISSN 1064-6647.

Goovaerts, M. J. and Hoogstad, W. J. (1987), Credibility Theory, Surveys of Actuarial Studies, No. 4, Nationale-Nederlanden N.V.

Examples

Run this code

# NOT RUN {
data(hachemeister)

## Buhlmann-Straub model
fit <- cm(~state, hachemeister,
          ratios = ratio.1:ratio.12, weights = weight.1:weight.12)
fit				# print method
predict(fit)			# credibility premiums
summary(fit)			# more details

## Two-level hierarchical model. Notice that data does not have
## to be sorted by level
X <- data.frame(unit = c("A", "B", "A", "B", "B"), hachemeister)
fit <- cm(~unit + unit:state, X, ratio.1:ratio.12, weight.1:weight.12)
predict(fit)
predict(fit, levels = "unit")	# unit credibility premiums only
summary(fit)
summary(fit, levels = "unit")	# unit summaries only

## Regression model with intercept at time origin
fit <- cm(~state, hachemeister,
          regformula = ~time, regdata = data.frame(time = 12:1),
          ratios = ratio.1:ratio.12, weights = weight.1:weight.12)
fit
predict(fit, newdata = data.frame(time = 0))
summary(fit, newdata = data.frame(time = 0))

## Same regression model, with intercept at barycenter of time
fit <- cm(~state, hachemeister, adj.intercept = TRUE,
          regformula = ~time, regdata = data.frame(time = 12:1),
          ratios = ratio.1:ratio.12, weights = weight.1:weight.12)
fit
predict(fit, newdata = data.frame(time = 0))
summary(fit, newdata = data.frame(time = 0))

## Poisson/Gamma pure Bayesian model
fit <- cm("bayes", data = c(5, 3, 0, 1, 1),
          likelihood = "poisson", shape = 3, rate = 3)
fit
predict(fit)
summary(fit)

## Normal/Normal pure Bayesian model
cm("bayes", data = c(5, 3, 0, 1, 1),
   likelihood = "normal", sd.lik = 2,
   mean = 2, sd = 1)
# }

Run the code above in your browser using DataLab