Trace AIC and BIC vs. Penalty
For an ordinary unpenalized fit from
ols and for a vector or list of penalties,
fits a series of logistic or linear models using penalized maximum likelihood
estimation, and saves the effective degrees of freedom, Akaike Information
Criterion ($AIC$), Schwarz Bayesian Information Criterion ($BIC$), and
Hurvich and Tsai's corrected $AIC$ ($AIC_c$). Optionally
nlminb function to solve for the optimum penalty factor or
combination of factors penalizing different kinds of terms in the model.
effective.df function prints the original and effective
degrees of freedom for a penalized fit or for an unpenalized fit and
the best penalization determined from a previous invocation of
method="grid" (the default).
The effective d.f. is computed separately for each class of terms in
the model (e.g., interaction, nonlinear).
plot method exists to plot the results, and a
the first two types of penalty factors are plotted, showing only the $AIC$.
pentrace(fit, penalty, penalty.matrix, method=c('grid','optimize'), which=c('aic.c','aic','bic'), target.df, fitter, pr=FALSE, tol=1e-7, keep.coef=FALSE, complex.more=TRUE, verbose=FALSE, maxit=12, subset)
## S3 method for class 'pentrace': print(x, \dots)
## S3 method for class 'pentrace': plot(x, method=c('points','image'), which=c('effective.df','aic','aic.c','bic'), pch=2, add=FALSE, ylim, ...)
- a result from
x=TRUE, y=TRUEand without using
penalty.matrix(or optionally using penalization in the case of
- can be a vector or a list. If it is a vector, all types of terms in
the model will be penalized by the same amount, specified by elements in
penalty, with a penalty of zero automatically added.
penaltycan also be a list in the
- an object returned by
objectcan be omitted if the
- The default is
method="grid"to print various indexes for all combinations of penalty parameters given by the user. Specify
nlminbto solve for the combination of
- the objective to maximize for either
method. Default is
"aic.c"(corrected AIC). For
whichis a vector of names of criteria to show; default is to plot all 4 types, with effective d.f. in
- applies only to
target.dfmakes sense mainly when a single type of penalty factor is specified.
- a fitting function. Default is
lm.pfitis always used for
- set to
TRUEto print intermediate results
- tolerance for declaring a matrix singular (see
- set to
TRUEto store matrix of regression coefficients for all the fits (corresponding to increasing values of
penalty) in object
Coefficientsin the returned list. Rows correspond to penalties, columns to regressi
- By default if
penaltyis a list, combinations of penalties for which complex terms are penalized less than less complex terms will be dropped after
expand.gridis invoked. Set
complex.more=FALSEto allow more comple
- set to
TRUEto print number of intercepts and sum of effective degrees of freedom
- maximum number of iterations to allow in a model fit (default=12).
This is passed to the appropriate fitter function with the correct
argument name. Increase
maxitif you had to when fitting the original unpenalized model.
- a logical or integer vector specifying rows of the design and response
matrices to subset in fitting models. This is most useful for
pentraceto see if the best penalty can be estimated with little error so that variation due t
- a result from
- used for
- set to
TRUEto add to an existing plot. In that case, the effective d.f. plot is not re-drawn, but the AIC/BIC plot is added to.
- 2-vector of y-axis limits for plots other than effective d.f.
- other arguments passed to
- a list of class
penalty, df, objective, fit, var.adj, diag, results.all, and optionally
Coefficients. The first 6 elements correspond to the fit that had the best objective as named in the
whichargument, from the sequence of fits tried. Here
fitis the fit object from
fitterwhich was a penalized fit,
diagis the diagonal of the matrix used to compute the effective d.f., and
var.adjis Gray (1992) Equation 2.9, which is an improved covariance matrix for the penalized beta.
results.allis a data frame whose first few variables are the components of
penaltyand whose other columns are
df, aic, bic, aic.c.
results.allthus contains a summary of results for all fits attempted. When
method="optimize", only two components are returned:
objective, and the object does not have a class.
- logistic regression model
- penalized MLE
- ridge regression
Gray RJ: Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA 87:942--951, 1992.
Hurvich CM, Tsai, CL: Regression and time series model selection in small samples. Biometrika 76:297--307, 1989.
n <- 1000 # define sample size set.seed(17) # so can reproduce the results age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) # Specify population model for log odds that Y=1 L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) f <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) p <- pentrace(f, seq(.2,1,by=.05)) plot(p) p$diag # may learn something about fractional effective d.f. # for each original parameter pentrace(f, list(simple=c(0,.2,.4), nonlinear=c(0,.2,.4,.8,1))) # Bootstrap pentrace 5 times, making a plot of corrected AIC plot with 5 reps n <- nrow(f$x) plot(pentrace(f, seq(.2,1,by=.05)), which='aic.c', col=1, ylim=c(30,120)) #original in black for(j in 1:5) plot(pentrace(f, seq(.2,1,by=.05), subset=sample(n,n,TRUE)), which='aic.c', col=j+1, add=TRUE) # Find penalty giving optimum corrected AIC. Initial guess is 1.0 # Not implemented yet # pentrace(f, 1, method='optimize') # Find penalty reducing total regression d.f. effectively to 5 # pentrace(f, 1, target.df=5) # Re-fit with penalty giving best aic.c without differential penalization f <- update(f, penalty=p$penalty) effective.df(f)