fic.lm: Focused information criteria for linear models

Description

Focused information criteria for linear models fitted with lm. Typically used to compare models with different covariates (more generally, different linear terms).

Usage

# S3 method for lm
fic(wide, inds, inds0 = NULL, gamma0 = 0, focus = NULL,
  focus_deriv = NULL, wt = NULL, sub = "auto", B = 0, loss = loss_mse,
  ...)

Arguments

wide

Fitted model object containing the wide model.

inds

Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed.

A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in inds0.

inds0

Vector of indicators specifying the narrow model, in the same format as inds. If this is omitted, the narrow model is assumed to be defined by the first row of inds (if inds is a matrix), or inds itself if this is a vector.

gamma0

Vector of special values taken by the parameters \(gamma\) which define the narrow model.

This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0.

This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of inds0 which are zero.

focus

An R function with:

first argument named par, denoting a vector of parameters, of the same length as in wide model
other arguments defining alternative focuses. These are supplied through the ... argument to fic. In the built-in examples, there is an argument named X, denoting alternative covariate values. The required format is documented below.

The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. X, then these are assumed to represent multiple focuses, and focus should return a vector giving the focus for par and each row of X. Otherwise focus should return a scalar giving the focus value at par.

Not required if focus_deriv is specified.

Alternatively, focus can be a character string naming a built-in focus function supplied by the fic package. Currently these include:

"prob_logistic", the probability of the outcome in a logistic regression model

"mean_normal" the mean outcome in a normal linear regression model

See focus_fns for the functions underlying these built-in focuses.

focus_deriv

Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in focus, using numerical differentiation.

wt

Vector of weights to apply to different covariate values in X. This should have length equal to the number of alternative values for the covariates, that is, the number of alternative focuses of interest. The covariate-specific focused model comparison statistics are then supplemented by averaged statistics for a population defined by this distribution of covariate values. If this argument is omitted, the values are assumed to have equal weight when computing the average. The weights are not normalised, though the interpretation is unclear if the weights don't sum to one.

sub

If "auto" (the default) then the submodels are fitted automatically within this function. If NULL they are not fitted, and focus estimates are not returned with the results.

The model parameters include the intercept, followed by the coefficients of any covariates. The standard deviation is excluded.

Only covariate selection problems are supported in this function. To compare between models with a fixed and unknown standard deviation, hand-written maximum likelihood estimation routines would be needed, along the lines described in the "skew-normal models" vignette.

The focus can depend on the standard deviation. The focus function should then have an argument sigma.

See the vignette "Using the fic R package for focused model comparison: linear regression" for some examples.

B

If B is 0 (the default) the standard analytic formulae for the focused model comparison statistics are used with mean square error loss. If B>0, then a parametric bootstrap method is used with B bootstrap samples, and the loss specified in the loss argument. More details of this approach are given in the package vignette "Focused model comparison with bootstrapping and alternative loss functions".

loss

A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments sub and wide. sub should be a scalar giving the focus estimate from a submodel. wide should be a vector with a sample of focus estimates from the wide model, e.g. generated by a bootstrap method. By default this is a function calculating the root mean square error of the submodel estimate. An example is given in the vignette "Focused model comparison with bootstrapping and alternative loss functions".

...

Other arguments to the focus function can be supplied here.

The built-in focus functions prob_logistic and mean_normal take an argument X giving covariate values defining the focus. This can either be a matrix or a vector, or a list or data frame that can be coerced into a matrix.

If just one focus is needed, then X can be a vector of length equal to the number of parameters in the wide model.

To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, X should be a matrix, with number of columns equal to the number of parameters in the wide model, and number of rows equal to the number of alternative focuses.

For a typical regression model, the first parameter will denote an intercept, so the first value of X should be 1, and the remaining values should correspond to covariates whose coefficients form parameters of the wide model. See the examples in the vignette.

Arguments to the focus function other than X can also be supplied as a matrix, vector, list or data frame in the same way. An exception is when the argument is supplied as a vector, this is assumed to refer to multiple focuses. For example, suppose the focus function defines the quantile of a distribution, and takes an argument focus_p, then calling fic(...,focus_p=c(0.1, 0.9)) indicates two alternative focuses defined by the 0.1 and 0.9 quantiles.

Examples

Run this code


## Covariate selection in Motor Trend cars data
## See the "fic" package vignette on linear models for more details

wide.lm <- lm(mpg ~ am + wt + qsec + disp + hp, data=mtcars)

## Select between all submodels 
ncovs_wide <- length(coef(wide.lm)) - 1
inds0 <- c(1, rep(0, ncovs_wide))
inds <- all_inds(wide.lm, inds0)

## Two focuses: mean MPG for automatic and manual transmission,
## given mean values of the other covariates 
cmeans <- colMeans(model.frame(wide.lm)[,c("wt","qsec","disp","hp")])
X <- rbind(
  "auto"   = c(intercept=1, am=0, cmeans),
  "manual" = c(intercept=1, am=1, cmeans)
)
ficres <- fic(wide.lm, inds=inds, focus=mean_normal, X=X)
summary(ficres)
ggplot_fic(ficres)

Run the code above in your browser using DataLab