gIndex
Calculate Total and Partial g-indexes for an rms Fit
gIndex
computes the total \(g\)-index for a model based on
the vector of linear predictors, and the partial \(g\)-index for
each predictor in a model. The latter is computed by summing all the
terms involving each variable, weighted by their regression
coefficients, then computing Gini's mean difference on this sum. For
example, a regression model having age and sex and age*sex on the
right hand side, with corresponding regression coefficients \(b_{1},
b_{2}, b_{3}\) will have the \(g\)-index for age
computed from Gini's mean
difference on the product of age \(\times (b_{1} + b_{3}w)\) where
\(w\) is an indicator set to one for observations with sex not equal
to the reference value. When there are nonlinear terms associated
with a predictor, these terms will also be combined.
A print
method is defined, and there is a plot
method for displaying
\(g\)-indexes using a dot chart.
These functions use Hmisc::GiniMd
.
- Keywords
- robust, univar, predictive accuracy
Usage
gIndex(object, partials=TRUE, type=c('ccterms', 'cterms', 'terms'),
lplabel=if(length(object$scale) && is.character(object$scale))
object$scale[1] else 'X*Beta',
fun, funlabel=if(missing(fun)) character(0) else
deparse(substitute(fun)),
postfun=if(length(object$scale)==2) exp else NULL,
postlabel=if(length(postfun))
ifelse(missing(postfun),
if((length(object$scale) > 1) &&
is.character(object$scale)) object$scale[2] else
'Anti-log',
deparse(substitute(postfun))) else character(0),
…)# S3 method for gIndex
print(x, digits=4, abbrev=FALSE,
vnames=c("names","labels"), …)
# S3 method for gIndex
plot(x, what=c('pre', 'post'),
xlab=NULL, pch=16, rm.totals=FALSE,
sort=c('descending', 'ascending', 'none'), …)
Arguments
- object
result of an
rms
fitting function- partials
set to
FALSE
to suppress computation of partial \(g\)s- type
defaults to
'ccterms'
which causes partial discrimination indexes to be computed after maximally combining all related main effects and interactions. The is usually the only way that makes sense when considering partial linear predictors. Specifytype='cterms'
to only combine a main effect with interactions containing it, not also with other main effects connected through interactions. Usetype='terms'
to separate interactions into their own effects.- lplabel
a replacement for default values such as
"X*Beta"
or"log odds"
/- fun
an optional function to transform the linear predictors before computing the total (only) \(g\). When this is present, a new component
gtrans
is added to the attributes of the object resulting fromgIndex
.- funlabel
a character string label for
fun
, otherwise taken from the function name itself- postfun
a function to transform \(g\) such as
exp
(anti-log), which is the default for certain models such as the logistic and Cox models- postlabel
a label for
postfun
- …
For
gIndex
, passed topredict.rms
. Ignored forprint
. Passed todotchart2
forplot
.- x
an object created by
gIndex
(forprint
orplot
)- digits
causes rounding to the
digits
decimal place- abbrev
set to
TRUE
to abbreviate labels ifvname="labels"
- vnames
set to
"labels"
to print predictor labels instead of names- what
set to
"post"
to plot the transformed \(g\)-index if there is one (e.g., ratio scale)- xlab
\(x\)-axis label; constructed by default
- pch
plotting character for point
- rm.totals
set to
TRUE
to remove the total \(g\)-index when plotting- sort
specifies how to sort predictors by \(g\)-index; default is in descending order going down the dot chart
Details
For stratification factors in a Cox proportional hazards model, there is no contribution of variation towards computing a partial \(g\) except from terms that interact with the stratification variable.
Value
gIndex
returns a matrix of class "gIndex"
with auxiliary
information stored as attributes, such as variable labels.
GiniMd
returns a scalar.
References
David HA (1968): Gini's mean difference rediscovered. Biometrika 55:573--575.
See Also
Examples
# NOT RUN {
set.seed(1)
n <- 40
x <- 1:n
w <- factor(sample(c('a','b'), n, TRUE))
u <- factor(sample(c('A','B'), n, TRUE))
y <- .01*x + .2*(w=='b') + .3*(u=='B') + .2*(w=='b' & u=='B') + rnorm(n)/5
dd <- datadist(x,w,u); options(datadist='dd')
f <- ols(y ~ x*w*u, x=TRUE, y=TRUE)
f
anova(f)
z <- list()
for(type in c('terms','cterms','ccterms'))
{
zc <- predict(f, type=type)
cat('type:', type, '\n')
print(zc)
z[[type]] <- zc
}
zc <- z$cterms
GiniMd(zc[, 1])
GiniMd(zc[, 2])
GiniMd(zc[, 3])
GiniMd(f$linear.predictors)
g <- gIndex(f)
g
g['Total',]
gIndex(f, partials=FALSE)
gIndex(f, type='cterms')
gIndex(f, type='terms')
y <- y > .8
f <- lrm(y ~ x * w * u, x=TRUE, y=TRUE)
gIndex(f, fun=plogis, funlabel='Prob[y=1]')
# Manual calculation of combined main effect + interaction effort of
# sex in a 2x2 design with treatments A B, sexes F M,
# model -.1 + .3*(treat=='B') + .5*(sex=='M') + .4*(treat=='B' & sex=='M')
set.seed(1)
X <- expand.grid(treat=c('A','B'), sex=c('F', 'M'))
a <- 3; b <- 7; c <- 13; d <- 5
X <- rbind(X[rep(1, a),], X[rep(2, b),], X[rep(3, c),], X[rep(4, d),])
y <- with(X, -.1 + .3*(treat=='B') + .5*(sex=='M') + .4*(treat=='B' & sex=='M'))
f <- ols(y ~ treat*sex, data=X, x=TRUE)
gIndex(f, type='cterms')
k <- coef(f)
b1 <- k[2]; b2 <- k[3]; b3 <- k[4]
n <- nrow(X)
( (a+b)*c*abs(b2) + (a+b)*d*abs(b2+b3) + c*d*abs(b3))/(n*(n-1)/2 )
# Manual calculation for combined age effect in a model with sex,
# age, and age*sex interaction
a <- 13; b <- 7
sex <- c(rep('female',a), rep('male',b))
agef <- round(runif(a, 20, 30))
agem <- round(runif(b, 20, 40))
age <- c(agef, agem)
y <- (sex=='male') + age/10 - (sex=='male')*age/20
f <- ols(y ~ sex*age, x=TRUE)
f
gIndex(f, type='cterms')
k <- coef(f)
b1 <- k[2]; b2 <- k[3]; b3 <- k[4]
n <- a + b
sp <- function(w, z=w) sum(outer(w, z, function(u, v) abs(u-v)))
( abs(b2)*sp(agef) + abs(b2+b3)*sp(agem) + 2*sp(b2*agef, (b2+b3)*agem) ) / (n*(n-1))
( abs(b2)*GiniMd(agef)*a*(a-1) + abs(b2+b3)*GiniMd(agem)*b*(b-1) +
2*sp(b2*agef, (b2+b3)*agem) ) / (n*(n-1))
# }
# NOT RUN {
# Compare partial and total g-indexes over many random fits
plot(NA, NA, xlim=c(0,3), ylim=c(0,3), xlab='Global',
ylab='x1 (black) x2 (red) x3 (green) x4 (blue)')
abline(a=0, b=1, col=gray(.9))
big <- integer(3)
n <- 50 # try with n=7 - see lots of exceptions esp. for interacting var
for(i in 1:100) {
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
x4 <- runif(n)
y <- x1 + x2 + x3 + x4 + 2*runif(n)
f <- ols(y ~ x1*x2+x3+x4, x=TRUE)
# f <- ols(y ~ x1+x2+x3+x4, x=TRUE) # also try this
w <- gIndex(f)[,1]
gt <- w['Total']
points(gt, w['x1, x2'])
points(gt, w['x3'], col='green')
points(gt, w['x4'], col='blue')
big[1] <- big[1] + (w['x1, x2'] > gt)
big[2] <- big[2] + (w['x3'] > gt)
big[3] <- big[3] + (w['x4'] > gt)
}
print(big)
# }
# NOT RUN {
options(datadist=NULL)
# }