gIndex: Calculate Total and Partial g-indexes for an rms Fit

Description

gIndex computes the total $g$-index for a model based on the vector of linear predictors, and the partial $g$-index for each predictor in a model. The latter is computed by summing all the terms involving each variable, weighted by their regression coefficients, then computing Gini's mean difference on this sum. For example, a regression model having age and sex and age*sex on the right hand side, with corresponding regression coefficients $b_{1}, b_{2}, b_{3}$ will have the $g$-index for age computed from Gini's mean difference on the product of age $\times (b_{1} + b_{3}w)$ where $w$ is an indicator set to one for observations with sex not equal to the reference value. When there are nonlinear terms associated with a predictor, these terms will also be combined.

A print method is defined, and there is a plot method for displaying $g$-indexes using a dot chart.

A basic function GiniMD computes Gini's mean difference on a numeric vector. This index is defined as the mean absolute difference between any two distinct elements of a vector. For a Bernoulli (binary) variable with proportion of ones equal to $p$ and sample size $n$, Gini's mean difference is $2\frac{n}{n-1}p(1-p)$. For a trinomial variable (e.g., predicted values for a 3-level categorical predictor using two dummy variables) having (predicted) values $A, B, C$ with corresponding proportions $a, b, c$, Gini's mean difference is $2\frac{n}{n-1}[ab|A-B|+ac|A-C|+bc|B-C|]$

Usage

gIndex(object, partials = TRUE,
lplabel = if (length(object$scale)) object$scale[1] else "X*Beta",
fun,
funlabel = if (missing(fun)) character(0) else deparse(substitute(fun)),
postfun = if (length(object$scale) == 2) exp else NULL,
postlabel = if (length(postfun))
 ifelse(missing(postfun), if (length(object$scale) > 1) object$scale[2]
  else "Anti-log", deparse(substitute(postfun))) else character(0), ...)
## S3 method for class 'gIndex':
print(x, digits=4, abbrev=FALSE,
 vnames=c("names","labels"), ...)
## S3 method for class 'gIndex':
plot(x, what=c('pre', 'post'),
 xlab=NULL, pch=16, rm.totals=FALSE,
sort=c('descending', 'ascending', 'none'), ...)
GiniMd(x, na.rm=FALSE)

Arguments

object

result of an rms fitting function

partials

set to FALSE to suppress computation of partial $g$s

lplabel

a replacement for default values such as "X*Beta" or "log odds"/

fun

an optional function to transform the linear predictors before computing the total (only) $g$. When this is present, a new component gtrans is added to the attributes of the object resulting from gIndex.

funlabel

a character string label for fun, otherwise taken from the function name itself

postfun

a function to transform $g$ such as exp (anti-log), which is the default for certain models such as the logistic and Cox models

postlabel

a label for postfun

...

For gIndex, passed to predict.rms. Ignored for print. Passed to dotchart2 for plot.

an object created by gIndex (for print or plot) or a numeric vector (for GiniMd)

digits

causes rounding to the digits decimal place

abbrev

set to TRUE to abbreviate labels if vname="labels"

vnames

set to "labels" to print predictor labels instead of names

what

set to "post" to plot the transformed $g$-index if there is one (e.g., ratio scale)

xlab

$x$-axis label; constructed by default

pch

plotting character for point

rm.totals

set to TRUE to remove the total $g$-index when plotting

sort

specifies how to sort predictors by $g$-index; default is in descending order going down the dot chart

na.rm

set to TRUE if you suspect there may be NAs in x; these will then be removed. Otherwise an error will result.

Value

gIndex returns a matrix of class "gIndex" with auxiliary information stored as attributes, such as variable labels. GiniMd returns a scalar.

Details

For stratification factors in a Cox proportional hazards model, there is no contribution of variation towards computing a partial $g$ except from terms that interact with the stratification variable.

References

David HA (1968): Gini's mean difference rediscovered. Biometrika 55:573--575.

Examples

Run this code

set.seed(1)
n <- 100
x <- 1:n
w <- factor(sample(c('a','b'), n, TRUE))
u <- factor(sample(c('A','B'), n, TRUE))
y <- .01*x + .2*(w=='b') + .3*(u=='B') + .2*(w=='b' & u=='B') + rnorm(n)/5
dd <- datadist(x,w,u); options(datadist='dd')
f <- ols(y ~ x*w*u, x=TRUE, y=TRUE)
f
anova(f)

zc <- predict(f, type='cterms')

# Test GiniMd against a brute-force solution
gmd <- function(x)
  {
    n <- length(x)
    sum(outer(x, x, function(a, b) abs(a - b)))/n/(n-1)
  }
gmd(zc[, 1])
GiniMd(zc[, 1])
GiniMd(zc[, 2])
GiniMd(zc[, 3])
GiniMd(f$linear.predictors)
g <- gIndex(f)
g
g['Total',]
gIndex(f, partials=FALSE)

z <- c(rep(0,17), rep(1,6))
n <- length(z)
GiniMd(z)
2*mean(z)*(1-mean(z))*n/(n-1)

a <- 12; b <- 13; c <- 7; n <- a + b + c
A <- -.123; B <- -.707; C <- 0.523
xx <- c(rep(A, a), rep(B, b), rep(C, c))
GiniMd(xx)
2*(a*b*abs(A-B) + a*c*abs(A-C) + b*c*abs(B-C))/n/(n-1)

y <- y > .8
f <- lrm(y ~ x * w * u, x=TRUE, y=TRUE)
gIndex(f, fun=plogis, funlabel='Prob[y=1]')
options(datadist=NULL)

Run the code above in your browser using DataLab