multi: Multiple `coxph` models

Description

Multiple coxph models

Usage

multi(x, ...)
## S3 method for class 'coxph':
multi(x, ..., maxCoef = 5L, crit = c("aic", "aicc", "bic"),
  how = c("all", "evolve"), confSetSize = 100L, maxiter = 100L,
  bunch = 1000L, mutRate = 0.1, sexRate = 0.2, immRate = 0.3,
  deltaM = 1, deltaB = 1, conseq = 10L, report = TRUE)

Arguments

An object of class coxph

...

Not implemented

maxCoef

Maximum no. of coefficients

crit

Information criterion IC

how

Method used to fit models. If how="all" (the default), all subsets of the given model will be fit

confSetSize

Size of returned confidence size. Number represents a row in the set. (Columns represent parameters/coefficients in the models).

maxiter

Maximum no. of iterations to use (for cox fitter). Needs to be integer and should not normally need to be > 100.

bunch

When using how="evolve": no. of models to screen per generation

mutRate

Mutation rate for new models (both asexual and sexual selection). Should be in range $0-1$.

sexRate

Sexual reproduction rate. Should be in range $0-1$.

immRate

Immigration rate. Should be in range $0-1$. Also sexRate + immRate should not be $> 1$.

deltaM

Target for change in mean IC determining convergence when how="evolve". The last mean IC (from the best confSetSize models screened) is compared with that from the most recently fitted bunch.

deltaB

Change in best IC determining convergence of evolution. This typically converges faster than deltaB.

conseq

Consecutive generations allowed which are 'divergent' by both of the above criteria. Algorithm will stop after this no. is reached.

report

If report=TRUE (the default), print report to screen during fitting. Gives current change in best and mean IC as well as object size of fitted models.

Value

A data.table with one row per model. This is of class multi.coxph which has it's own plot method. Columns show the coefficients from the fitted model. Values of $0$ indicate coefficient was not included. The data.table is sorted by IC and also gives a column for relative evidence weights. These are generated from: $$w_i = \exp (\frac{-IC_i - IC_{best}}{2})$$ Where $IC_i$ is the information criterion for the given model, and $IC_{best}$ is that for the best model yet fitted. They are then scaled to sum to $1$.

Details

This is based loosely on package:glmulti (although is admittedly less efficient). A more detailed discussion of the issues involved in multiple model fitting is presented in the reference paper describing that package's implementation. It is designed for cases where there a large no. of candidate models for a given dataset (currently only right-censored survival data). First, the model.matrix for the given formula is constructed. For those unfamiliar with model.matrix, a predictor given as a factor is expanded to it's design matrix, so that e.g. for 4 original levels there will be 3 binary ($0/1$) columns. Currently all levels of a factor are considered independently when fitting models. Thus there is one column for each coefficient in the original model. The original formula can include the following terms: offset, weight and strata. Other special terms such as cluster are not currently supported. The formula may contain interaction terms and other transformations. If how="all", all possible combinations of these coefficients are fitted (or up to maxCoef predictors if this is less). If how="evolve" the algorithm proceeds as follows:

Fitbunchrandom models and sort by IC
Generate anotherbunchcandidate models based on these.immRategives the proportion that will be completely random new models.sexRategives the proportion that will be the products of existing models. These are a random combination of the first elements from model 1 and the last elements from model 2. The sum ofimmRateandsexRateshould thus be$<= 1$.<="" li="">
Other models (asexual) will be selected from the existing pool of fitted models with a likelihood inversely proportional to their IC (i.e. lower IC - more likely). Both these and those generated by sexual reproduction have a chance of mutation (elements changing from$1$to$0$or vice versa) given bymutRate.
Fit new models (not already fitted).
Proceed until model fitting is 'divergent'conseqtimes then stop. Divergent is here taken to mean that the targets forbothdeltaManddeltaBhave not been met. deltaMtypically converges more slowly. Thus a large value ofdeltaMwill require newbunches of models to be signifiantly better than the best (size =confSetSize) existing candidates. Negative values ofdeltaM(not typically recommended) are more permissive; i.e. new models can be somewhat worse than those existing.

The models are returned in a data.table, with one row per model giving the fitted coefficients, the IC for the model and the relative evidence weights.

References

Calgano V, de Mazancourt C, 2010. glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models. Journal of Statistical Software. 34(12):1-29. http://www.jstatsoft.org/v34/i12/paper{Available at JSS}.

Examples

Run this code

set.seed(1)
df1 <- genSurvDf(b=1, c=5, f=0, model=FALSE)
multi(coxph(Surv(t1, e) ~ ., data=df1), crit="aic")
### longer example
dt1 <- genSurvDt(b=1, c=30, f=0, model=FALSE)
multi(coxph(Surv(t1, e) ~ ., data=dt1),
maxCoef=8, crit="bic", how="evolve", deltaM=1, deltaB=1, conseq=10)

Run the code above in your browser using DataLab