pentrace: Trace AIC and BIC vs. Penalty

Description

For an ordinary unpenalized fit from lrm or ols and for a vector or list of penalties, fits a series of logistic or linear models using penalized maximum likelihood estimation, and saves the effective degrees of freedom, Akaike Information Criterion ($AIC$), Schwarz Bayesian Information Criterion ($BIC$), and Hurvich and Tsai's corrected $AIC$ ($AIC_c$). Optionally pentrace can use the nlminb function to solve for the optimum penalty factor or combination of factors penalizing different kinds of terms in the model. The effective.df function prints the original and effective degrees of freedom for a penalized fit or for an unpenalized fit and the best penalization determined from a previous invocation of pentrace if method="grid" (the default). The effective d.f. is computed separately for each class of terms in the model (e.g., interaction, nonlinear). A plot method exists to plot the results, and a print method exists to print the most pertinent components. Both $AIC$ and $BIC$ may be plotted if there is only one penalty factor type specified in penalty. Otherwise, the first two types of penalty factors are plotted, showing only the $AIC$.

Usage

pentrace(fit, penalty, penalty.matrix, 
         method=c('grid','optimize'),
         which=c('aic.c','aic','bic'), target.df,
         fitter, pr=FALSE, tol=1e-7,
         keep.coef=FALSE, complex.more=TRUE, verbose=FALSE, maxit=12, subset)
effective.df(fit, object)
## S3 method for class 'pentrace':
print(x, \dots)
## S3 method for class 'pentrace':
plot(x, method=c('points','image'), 
     which=c('effective.df','aic','aic.c','bic'), pch=2, add=FALSE, 
     ylim, ...)

Arguments

fit

a result from lrm or ols with x=TRUE, y=TRUE and without using penalty or penalty.matrix (or optionally using penalization in the case of effective.df)

penalty

can be a vector or a list. If it is a vector, all types of terms in the model will be penalized by the same amount, specified by elements in penalty, with a penalty of zero automatically added. penalty can also be a list in the

object

an object returned by pentrace. For effective.df, object can be omitted if the fit was penalized.

penalty.matrix

see lrm

method

The default is method="grid" to print various indexes for all combinations of penalty parameters given by the user. Specify method="optimize" to have pentrace use nlminb to solve for the combination of

which

the objective to maximize for either method. Default is "aic.c" (corrected AIC). For plot.pentrace, which is a vector of names of criteria to show; default is to plot all 4 types, with effective d.f. in

target.df

applies only to method="optimize". See method. target.df makes sense mainly when a single type of penalty factor is specified.

fitter

a fitting function. Default is lrm.fit (lm.pfit is always used for ols).

set to TRUE to print intermediate results

tol

tolerance for declaring a matrix singular (see lrm.fit, solvet)

keep.coef

set to TRUE to store matrix of regression coefficients for all the fits (corresponding to increasing values of penalty) in object Coefficients in the returned list. Rows correspond to penalties, columns to regressi

complex.more

By default if penalty is a list, combinations of penalties for which complex terms are penalized less than less complex terms will be dropped after expand.grid is invoked. Set complex.more=FALSE to allow more comple

verbose

set to TRUE to print number of intercepts and sum of effective degrees of freedom

maxit

maximum number of iterations to allow in a model fit (default=12). This is passed to the appropriate fitter function with the correct argument name. Increase maxit if you had to when fitting the original unpenalized model.

subset

a logical or integer vector specifying rows of the design and response matrices to subset in fitting models. This is most useful for bootstrapping pentrace to see if the best penalty can be estimated with little error so that variation due t

a result from pentrace

pch

used for method="points"

add

set to TRUE to add to an existing plot. In that case, the effective d.f. plot is not re-drawn, but the AIC/BIC plot is added to.

ylim

2-vector of y-axis limits for plots other than effective d.f.

...

other arguments passed to plot, lines, or image

Value

a list of class "pentrace" with elements penalty, df, objective, fit, var.adj, diag, results.all, and optionally Coefficients. The first 6 elements correspond to the fit that had the best objective as named in the which argument, from the sequence of fits tried. Here fit is the fit object from fitter which was a penalized fit, diag is the diagonal of the matrix used to compute the effective d.f., and var.adj is Gray (1992) Equation 2.9, which is an improved covariance matrix for the penalized beta. results.all is a data frame whose first few variables are the components of penalty and whose other columns are df, aic, bic, aic.c. results.all thus contains a summary of results for all fits attempted. When method="optimize", only two components are returned: penalty and objective, and the object does not have a class.

concept

logistic regression model
penalized MLE
ridge regression
shrinkage

References

Gray RJ: Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA 87:942--951, 1992.

Hurvich CM, Tsai, CL: Regression and time series model selection in small samples. Biometrika 76:297--307, 1989.

Examples

Run this code

n <- 1000    # define sample size
set.seed(17) # so can reproduce the results
age            <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol    <- rnorm(n, 200, 25)
sex            <- factor(sample(c('female','male'), n,TRUE))
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
  (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)


f <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
         x=TRUE, y=TRUE)
p <- pentrace(f, seq(.2,1,by=.05))
plot(p)
p$diag      # may learn something about fractional effective d.f. 
            # for each original parameter
pentrace(f, list(simple=c(0,.2,.4), nonlinear=c(0,.2,.4,.8,1)))


# Bootstrap pentrace 5 times, making a plot of corrected AIC plot with 5 reps
n <- nrow(f$x)
plot(pentrace(f, seq(.2,1,by=.05)), which='aic.c', 
     col=1, ylim=c(30,120)) #original in black
for(j in 1:5)
  plot(pentrace(f, seq(.2,1,by=.05), subset=sample(n,n,TRUE)), 
       which='aic.c', col=j+1, add=TRUE)


# Find penalty giving optimum corrected AIC.  Initial guess is 1.0
# Not implemented yet
# pentrace(f, 1, method='optimize')


# Find penalty reducing total regression d.f. effectively to 5
# pentrace(f, 1, target.df=5)


# Re-fit with penalty giving best aic.c without differential penalization
f <- update(f, penalty=p$penalty)
effective.df(f)

Run the code above in your browser using DataLab