The `Importance`

function considers variable importance (or
predictor importance) to be the effect that the variable has on
replicates \(\textbf{y}^{rep}\) (or
\(\textbf{Y}^{rep}\)) when the variable is removed from the
model by setting it equal to zero. Here, variable importance is
considered in terms of the comparison of posterior predictive
checks. This may be considered to be a form of sensitivity analysis,
and can be useful for model revision, variable selection, and model
interpretation.

Currently, this function only tests the variable importance of design matrix \(\textbf{X}\).

```
Importance(object, Model, Data, Categorical=FALSE, Discrep, d=0, CPUs=1,
Type="PSOCK")
```

object

An object of class `demonoid`

, `iterquad`

,
`laplace`

, `pmc`

, or `vb`

is required.

Model

The model specification function is required.

Data

A data set in a list is required. The dependent variable
is required to be named either `y`

or `Y`

. The
`Importance`

function will sequentially remove each column
vector in `X`

, so `X`

is required to be in data set
`Data`

.

Categorical

Logical. If `TRUE`

, then `y`

and
`yhat`

are considered to be categorical (such as y=0 or y=1),
rather than continuous. This defaults to `FALSE`

.

Discrep

This optional argument allows a discrepancy statistic to
be included. For more information on discrepancy statistics, see
`summary.demonoid.ppc`

.

d

This is an optional integer to be used with the
`Discrep`

argument above, and it defaults to `d=0`

. For
more information on discrepancy, see
`summary.demonoid.ppc`

.

CPUs

This argument accepts an integer that specifies the number
of central processing units (CPUs) of the multicore computer or
computer cluster. This argument defaults to `CPUs=1`

, in which
parallel processing does not occur.

Type

This argument specifies the type of parallel processing to
perform, accepting either `Type="PSOCK"`

or
`Type="MPI"`

.

`Importance`

returns an object of class `importance`

, which
is a matrix with a number of rows equal to the number of columns in
design matrix \(\textbf{X}\) + 1 (including the full model), and
4 columns, which are BPIC, Concordance (or Mean.Lift if categorical),
Discrep, and L-criterion. Each row represents a model with a predictor
in \(\textbf{X}\) removed (except for the first row, which is the
full model), and the resulting posterior predictive checks. For
non-categorical dependent variables, an attribute is returned with the
object, and the attribute is a vector of `S.L`

, the calibration
number of the L-criterion.

Variable importance is defined here as the impact of each variable (predictor, or column vector) in design matrix \(\textbf{X}\) on \(\textbf{y}^{rep}\) (or \(\textbf{Y}^{rep}\)), when the variable is removed.

First, the full model is predicted with the
`predict.demonoid`

, `predict.iterquad`

,
`predict.laplace`

, `predict.pmc`

, or
`predict.vb`

function, and summarized with the
`summary.demonoid.ppc`

,
`summary.iterquad.ppc`

, `summary.laplace.ppc`

,
`summary.pmc.ppc`

, or `summary.vb.ppc`

function, respectively. The results are stored in the first row of the
output. Each successive row in the output corresponds to the
application of `predict`

and `summary`

functions, but with
each variable in design matrix \(\textbf{X}\) being set to zero
and effectively removed. The results show the impact of sequentially
removing each predictor.

The criterion for variable importance may differ from model to model. As a default, BPIC is recommended. The Bayesian Predictive Information Criterion (BPIC) was introduced by Ando (2007). BPIC is a variation of the Deviance Information Criterion (DIC) that has been modified for predictive distributions. For more information on DIC (Spiegelhalter et al., 2002), see the accompanying vignette entitled "Bayesian Inference". \(BPIC = Dbar + 2pD\).

With BPIC, variable importance has a positive relationship, such that larger values indicate a more important variable, because removing that variable resulted in a worse fit to the data. The best model has the lowest BPIC.

In a model in which the dependent variable is not categorical, it is
also recommended to consider the L-criterion (Laud and Ibrahim, 1995),
provided that sample size is small enough that it does not result in
`Inf`

. For more information on the L-criterion, see the
accompanying vignette entitled "Bayesian Inference".

With the L-criterion, variable importance has a positive relationship,
such that larger values indicate a more important variable, because
removing that variable resulted in a worse fit to the data. Ibrahim
(1995) recommended considering the model with the lowest
L-criterion, say as \(L_1\), and the model with the closest
L-criterion, say as \(L_2\), and creating a comparison score
as \(\phi = (L_2-L_1)/S_L\), where
`S.L`

is from the \(L_1\) model. If the comparison score,
\(\phi\) is less than 2, then \(L_2\) is within 2
standard deviations of \(L_1\), and is the recommended cut-off
for model choice.

The `Importance`

function may suggest that a model fits the data
better with a variable removed. In which case, the user may
choose to leave the variable in the model (perhaps the model is
misspecified without the variable), investigate and possibly
re-specify the relationship between the independent and dependent
variable(s), or remove the variable and update the model again.

In contrast to variable importance, the `PosteriorChecks`

function calculates parameter importance, which is the probability
that each parameter's marginal posterior distribution is greater than
zero, where an important parameter does not include zero in its
probability interval (see `p.interval`

). Parameter
importance and variable importance may disagree, and both should be
studied.

The `Importance`

function tends to indicate that a model fits the
data better when variables are removed that have parameters with
marginal posterior distributions that include 0 in the 95%
probability interval (variables associated with lower parameter
importance).

Often, in complicated models, it is difficult to assess variable importance by examining the marginal posterior distribution of the associated parameter(s). Consider polynomial regression, in which each variable may have multiple parameters.

The information provided by the `Importance`

function may be used
for model revision, or reporting the relative importance of variables.

The `plot.importance`

function is available to plot the
output of the `Importance`

function according to BPIC, predictive
concordance (Gelfand, 1996), the selected discrepancy statistic
(Gelman et al., 1996), or the L-criterion.

Parallel processing may be performed when the user specifies
`CPUs`

to be greater than one, implying that the specified number
of CPUs exists and is available. Parallelization may be performed on a
multicore computer or a computer cluster. Either a Simple Network of
Workstations (SNOW) or Message Passing Interface is used (MPI). With
small data sets and few samples, parallel processing may be slower,
due to computer network communication. With larger data sets and more
samples, the user should experience a faster run-time.

Ando, T. (2007). "Bayesian Predictive Information Criterion for
the Evaluation of Hierarchical Bayesian and Empirical Bayes Models".
*Biometrika*, 94(2), p. 443--458.

Gelfand, A. (1996). "Model Determination Using Sampling Based Methods". In Gilks, W., Richardson, S., Spiegehalter, D., Chapter 9 in Markov Chain Monte Carlo in Practice. Chapman and Hall: Boca Raton, FL.

Laud, P.W. and Ibrahim, J.G. (1995). "Predictive Model
Selection". *Journal of the Royal Statistical Society*, B 57,
p. 247--262.

Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A.
(2002). "Bayesian Measures of Model Complexity and Fit (with
Discussion)". *Journal of the Royal Statistical Society*, B 64,
p. 583--639.

`is.importance`

,
`IterativeQuadrature`

,
`LaplaceApproximation`

,
`LaplacesDemon`

,
`PMC`

,
`plot.importance`

,
`PosteriorChecks`

,
`p.interval`

,
`predict.demonoid`

,
`predict.iterquad`

,
`predict.laplace`

,
`predict.pmc`

,
`predict.vb`

,
`summary.demonoid.ppc`

,
`summary.iterquad.ppc`

,
`summary.laplace.ppc`

,
`summary.pmc.ppc`

,
`summary.vb.ppc`

, and
`VariationalBayes`

.

```
# NOT RUN {
#First, update the model with the LaplacesDemon function, such as
#the example with linear regression, creating an object called Fit.
#Then
#Importance(Fit, Model, MyData, Discrep="Chi-Square", CPUs=1)
# }
```

Run the code above in your browser using DataLab