fic.default: Focused information criteria: main user interface

Description

Focused information criteria for general models. These methods estimate the bias and variance of estimates of a quantity of interest (the "focus") when smaller submodels are used in place of a "wide" model that is assumed to generate the data but may not give precise enough estimates.

Usage

# S3 method for default
fic(wide, inds, inds0 = NULL, gamma0 = 0, focus = NULL,
  focus_deriv = NULL, wt = NULL, sub = NULL, fns = NULL, FIC = FALSE,
  B = 0, loss = loss_mse, tidy = TRUE, ...)
fic(wide, ...)

Value

The returned data frame or array contains the following components, describing characteristics of the defined submodel. See the package vignette for full, formal definitions, and Chapter 6 of Claeskens and Hjort, 2008.

rmse: The root mean square error of the estimate of the focus quantity. Defined as the square root of (squared unadjusted bias plus variance). This is an asymptotically unbiased estimator, but may occasionally be indeterminate if the estimate of the squared bias plus variance is negative.
rmse.adj: The root mean square error, based on a bias estimator which is adjusted to avoid negative squared bias. Defined on page 157 of Claeskens and Hjort as the sum of the variance and the squared adjusted bias.
bias: The estimated bias of the focus quantity, adjusted to avoid negative squared bias. This is defined as the square root of the quantity $sqb3(S)$ from page 152 of Claeskens and Hjort, multiplied by the sign of the unadjusted bias.
se: The estimated standard error (root variance) of the focus quantity. Defined on page 157.
FIC: The focused information criterion (equation 6.1 from Claeskens and Hjort), if FIC=TRUE was supplied.

The object returned by fic also has the following attributes, which can be extracted with the attr function.

iwide: Index of the wide model in the vector of submodels, or NULL if the wide model is not included.
inarr: Index of the narrow model in the vector of submodels, or NULL if the wide model is not included.
sub: List of fitted submodel objects.
parnames: Vector of names of parameters in the wide model.
inds: Submodel indicators, as supplied in the inds argument.

Arguments

wide

Fitted model object containing the wide model.

inds

Matrix or vector of indicators for which parameters are included in the submodel or submodels to be assessed.

A matrix should be supplied if there are multiple submodels. This should have number of rows equal to the number of submodels, and number of columns equal to the total number of parameters in the wide model. It contains 1s in the positions where the parameter is included in the submodel, and 0s in positions where the parameter is excluded. This should always be 1 in the positions defining the narrow model, as specified in inds0.

inds0

Vector of indicators specifying the narrow model, in the same format as inds. If this is omitted, the narrow model is assumed to be defined by the first row of inds (if inds is a matrix), or inds itself if this is a vector.

gamma0

Vector of special values taken by the parameters $gamma$ which define the narrow model.

This defaults to 0, as in covariate selection, where "excluded" coefficients are fixed to 0.

This should either be a scalar, assumed to be the same for all parameters fixed in the narrow model, or a vector of length equal to the number of parameters from the wide model which are fixed in the narrow model, that is, the number of entries of inds0 which are zero.

focus

An R function with:

first argument named par, denoting a vector of parameters, of the same length as in wide model
other arguments defining alternative focuses. These are supplied through the ... argument to fic. In the built-in examples, there is an argument named X, denoting alternative covariate values. The required format is documented below.

The function should return the focus quantity of interest. If additional arguments are supplied which are vectors or matrices, e.g. X, then these are assumed to represent multiple focuses, and focus should return a vector giving the focus for par and each row of X. Otherwise focus should return a scalar giving the focus value at par.

Not required if focus_deriv is specified.

Alternatively, focus can be a character string naming a built-in focus function supplied by the fic package. Currently these include:

"prob_logistic", the probability of the outcome in a logistic regression model

"mean_normal" the mean outcome in a normal linear regression model

See focus_fns for the functions underlying these built-in focuses.

focus_deriv

Vector of partial derivatives of the focus function with respect to the parameters in the wide model. This is not usually needed, as it can generally be computed automatically and accurately from the function supplied in focus, using numerical differentiation.

wt

Vector of weights to apply to different covariate values in X. This should have length equal to the number of alternative values for the covariates, that is, the number of alternative focuses of interest. The covariate-specific focused model comparison statistics are then supplemented by averaged statistics for a population defined by this distribution of covariate values. If this argument is omitted, the values are assumed to have equal weight when computing the average. The weights are not normalised, though the interpretation is unclear if the weights don't sum to one.

sub

List of fitted model objects corresponding to each submodel to be assessed.

For some classes of models with built in methods for fic, e.g. fic.glm, the submodels are fitted automatically by default, so this argument does not need to be supplied.

Otherwise, this argument can be omitted, but it is required if you want the estimate of the focus function under each submodel to be included in the results, which is usually the case.

fns

Named list of functions to extract the quantities from the fitted model object that are required to calculate the focused model comparison statistics. By default this is

list(coef=coef, nobs=nobs, vcov=vcov)

Suppose the fitted model object is called mod. This default list assumes that

coef(mod) returns the vector of parameter estimates,
vcov(mod) returns the covariance matrix for the parameter estimates,
nobs(mod) returns the number of observations used in the model fit. Only required if the `classic` FIC is required, and not required to compute the mean square error of the focus.

If one or more of these functions does not work for mod, then the defaults can be changed. For example, suppose the functions coef(), nobs() and vcov() are not understood (or return something different) for your class of model objects, but the parameters are stored in mod$estimates, the number of observations is in mod$data$nobs, and the covariance matrix is in mod$cov, then the fns argument should be set to

list( coef = function(x){x$estimates}, nobs = function(x){x$data$nobs}, vcov = function(x){x$cov} )

If less than three components are specified in fns, then the missing components are assumed to take their default values.

FIC

If TRUE, then the Focused Information Criterion is returned with the results alongside the mean squared error and its components. This is done for built-in model classes, but optional for user-defined model classes, since it requires knowledge of the sample size n as well as the estimates and covariance matrix under the wide model.

B

If B is 0 (the default) the standard analytic formulae for the focused model comparison statistics are used with mean square error loss. If B>0, then a parametric bootstrap method is used with B bootstrap samples, and the loss specified in the loss argument. More details of this approach are given in the package vignette "Focused model comparison with bootstrapping and alternative loss functions".

loss

A function returning an estimated loss for a submodel estimate under the sampling distribution of the wide model. Only applicable when using bootstrapping. This should have two arguments sub and wide. sub should be a scalar giving the focus estimate from a submodel. wide should be a vector with a sample of focus estimates from the wide model, e.g. generated by a bootstrap method. By default this is a function calculating the root mean square error of the submodel estimate. An example is given in the vignette "Focused model comparison with bootstrapping and alternative loss functions".

tidy

If TRUE the results are returned as a data frame with variables to indicate the submodels, focuses and corresponding result statistics. If FALSE, the results are returned as a three-dimensional array, with dimensions indexed by the submodels, result statistics and focuses respectively.

...

Other arguments to the focus function can be supplied here.

The built-in focus functions prob_logistic and mean_normal take an argument X giving covariate values defining the focus. This can either be a matrix or a vector, or a list or data frame that can be coerced into a matrix.

If just one focus is needed, then X can be a vector of length equal to the number of parameters in the wide model.

To compute focused model comparison statistics for multiple focuses defined by the same focus function evaluated at multiple covariate values, X should be a matrix, with number of columns equal to the number of parameters in the wide model, and number of rows equal to the number of alternative focuses.

For a typical regression model, the first parameter will denote an intercept, so the first value of X should be 1, and the remaining values should correspond to covariates whose coefficients form parameters of the wide model. See the examples in the vignette.

Arguments to the focus function other than X can also be supplied as a matrix, vector, list or data frame in the same way. An exception is when the argument is supplied as a vector, this is assumed to refer to multiple focuses. For example, suppose the focus function defines the quantile of a distribution, and takes an argument focus_p, then calling fic(...,focus_p=c(0.1, 0.9)) indicates two alternative focuses defined by the 0.1 and 0.9 quantiles.

References

Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging (Vol. 330). Cambridge: Cambridge University Press.

Claeskens, G., & Hjort, N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98(464), 900-916.

Examples

Run this code


wide.glm <- glm(low ~ lwtkg + age + smoke + ht + ui + smokeage + smokeui,
                data=birthwt, family=binomial)
inds <- rbind(
              narrow = c(1,1,0,0,0,0,0,0),
              mod1 = c(1,1,1,1,0,0,0,0),
              wide = c(1,1,1,1,1,1,1,1)
)
vals.smoke <-    c(1, 58.24, 22.95, 1, 0, 0, 22.95, 0)
vals.nonsmoke <- c(1, 59.50, 23.43, 0, 0, 0, 0, 0)
X <- rbind("Smokers"=vals.smoke, "Non-smokers"=vals.nonsmoke)

fic(wide=wide.glm, inds=inds, focus="prob_logistic", X=X)

focus <- function(par, X)plogis(X %*% par)    
fic(wide=wide.glm, inds=inds, focus=focus, X=X)   # equivalent

Run the code above in your browser using DataLab