Analysis of Variance (Wald and F Statistics)
anova function automatically tests most meaningful hypotheses
in a design. For example, suppose that age and cholesterol are
predictors, and that a general interaction is modeled using a restricted
anova prints Wald statistics ($F$ statistics
ols fit) for testing linearity of age, linearity of
cholesterol, age effect (age + age by cholesterol interaction),
cholesterol effect (cholesterol + age by cholesterol interaction),
linearity of the age by cholesterol interaction (i.e., adequacy of the
simple age * cholesterol 1 d.f. product), linearity of the interaction
in age alone, and linearity of the interaction in cholesterol
alone. Joint tests of all interaction terms in the model and all
nonlinear terms in the model are also performed. For any multiple
d.f. effects for continuous variables that were not modeled through
lsp, etc., tests of linearity will be
omitted. This applies to matrix predictors produced by e.g.
print.anova.rms is the printing
plot.anova.rms draws dot charts depicting the importance
of variables in the model, as measured by Wald $\chi^2$,
$\chi^2$ minus d.f., AIC, $P$-values, partial
$R^2$, $R^2$ for the whole model after deleting the effects in
question, or proportion of overall model $R^2$ that is due to each
latex.anova.rms is the
latex method. It
substitutes Greek/math symbols in column headings, uses boldface for
TOTAL lines, and constructs a caption. Then it passes the result
latex.default for conversion to LaTeX.
## S3 method for class 'rms': anova(object, \ldots, main.effect=FALSE, tol=1e-9, test=c('F','Chisq'), ss=TRUE)
## S3 method for class 'anova.rms': print(x, which=c('none','subscripts','names','dots'), ...)
## S3 method for class 'anova.rms': plot(x, what=c("chisqminusdf","chisq","aic","P","partial R2","remaining R2", "proportion R2"), xlab=NULL, pch=16, rm.totals=TRUE, rm.ia=FALSE, rm.other=NULL, newnames, sort=c("descending","ascending","none"), pl=TRUE, ...)
## S3 method for class 'anova.rms': latex(object, title, psmall=TRUE, dec.chisq=2, dec.F=2, dec.ss=NA, dec.ms=NA, dec.P=4, \dots)
vcovto return the variance-covariance matrix. For
latexis the result of
- If omitted, all variables are tested, yielding tests for individual factors and for pooled effects. Specify a subset of the variables to obtain tests for only those factors, with a pooled Wald tests for the combined effects of all factors listed. Names ma
- Set to
TRUEto print the (usually meaningless) main effect tests even when the factor is involved in an interaction. The default is
FALSE, to print only the effect of the main effect combined with all interactions involving that
- singularity criterion for use in matrix inversion
- For an
test="Chisq"to use Wald $\chi^2$ tests rather than F-tests.
- For an
ss=FALSEto suppress printing partial sums of squares, mean squares, and the Error SS and MS.
print,plot,textis the result of
print.anova.rmswill add to the rightmost column of the output the list of parameters being tested by the hypothesis being tested in the current row. Specifying
- what type of statistic to plot. The default is the Wald
statistic for each factor (adding in the effect of higher-ordered
factors containing that factor) minus its degrees of freedom. The
last three choice for
whatonly apply to <
- x-axis label, default is constructed according to
plotmathsymbols are used for R, by default.
- character for plotting dots in dot charts. Default is 16 (solid dot).
- set to
FALSEto keep total $\chi^2$s (overall, nonlinear, interaction totals) in the chart.
- set to
TRUEto omit any effect that has
"*"in its name
- a list of other predictor names to omit from the chart
- a list of substitute predictor names to use, after omitting any.
- default is to sort bars in descending order of the summary statistic
- set to
FALSEto suppress plotting. This is useful when you only wish to analyze the vector of statistics returned.
- title to pass to
latex, default is name of fit object passed to
"anova.". For Windows, the default is
"ano"followed by the first 5 letters of the name of the fit object.
- The default is
psmall=TRUE, which causes
P<0.00005< code=""> to print as
<0.0001< code="">. Set to0.00005<>
FALSEto print as
- number of places to the right of the decimal place for typesetting
$\chi^2$ values (default is
2). Use zero for integer,
NAfor floating point.
- digits to the right for $F$ statistics (default is
- digits to the right for sums of squares (default is
NA, indicating floating point)
- digits to the right for mean squares (default is
- digits to the right for $P$-values
If the statistics being plotted with
plot.anova.rms are few in
number and one of them is negative or zero,
will quit because of an error in
anova.rmsreturns a matrix of class
anova.rmscontaining factors as rows and $\chi^2$, d.f., and $P$-values as columns (or d.f., partial $SS, MS, F, P$).
plot.anova.rmsinvisibly returns the vector of quantities plotted. This vector has a names attribute describing the terms for which the statistics in the vector are calculated.
latex creates a
file with a name of the form
"title.tex" (see the
title argument above).
n <- 1000 # define sample size set.seed(17) # so can reproduce the results treat <- factor(sample(c('a','b','c'), n,TRUE)) num.diseases <- sample(0:4, n,TRUE) age <- rnorm(n, 50, 10) cholesterol <- rnorm(n, 200, 25) weight <- rnorm(n, 150, 20) sex <- factor(sample(c('female','male'), n,TRUE)) label(age) <- 'Age' # label is in Hmisc label(num.diseases) <- 'Number of Comorbid Diseases' label(cholesterol) <- 'Total Cholesterol' label(weight) <- 'Weight, lbs.' label(sex) <- 'Sex' units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc # Specify population model for log odds that Y=1 L <- .1*(num.diseases-2) + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(treat=='a') + 3.5*(treat=='b')+2*(treat=='c')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) fit <- lrm(y ~ treat + scored(num.diseases) + rcs(age) + log(cholesterol+10) + treat:log(cholesterol+10)) anova(fit) # Test all factors anova(fit, treat, cholesterol) # Test these 2 by themselves # to get their pooled effects g <- lrm(y ~ treat*rcs(age)) dd <- datadist(treat, num.diseases, age, cholesterol) options(datadist='dd') p <- Predict(g, age=., treat="b") s <- anova(g) # Usually omit fontfamily to default to 'Courier' # It's specified here to make R pass its package-building checks plot(p, addpanel=pantext(s, 28, 1.9, fontfamily='Helvetica')) plot(s) # new plot - dot chart of chisq-d.f. # latex(s) # nice printout - creates anova.g.tex options(datadist=NULL) # Simulate data with from a given model, and display exactly which # hypotheses are being tested set.seed(123) age <- rnorm(500, 50, 15) treat <- factor(sample(c('a','b','c'), 500,TRUE)) bp <- rnorm(500, 120, 10) y <- ifelse(treat=='a', (age-50)*.05, abs(age-50)*.08) + 3*(treat=='c') + pmax(bp, 100)*.09 + rnorm(500) f <- ols(y ~ treat*lsp(age,50) + rcs(bp,4)) print(names(coef(f)), quote=FALSE) specs(f) anova(f) an <- anova(f) options(digits=3) print(an, 'subscripts') print(an, 'dots') an <- anova(f, test='Chisq', ss=FALSE) plot(0:1) # make some plot tab <- pantext(an, 1.2, .6, lattice=FALSE, fontfamily='Helvetica') # create function to write table; usually omit fontfamily tab() # execute it; could do tab(cex=.65) plot(an) # new plot - dot chart of chisq-d.f. # latex(an) # nice printout - creates anova.f.tex # Suppose that a researcher wants to make a big deal about a variable # because it has the highest adjusted chi-square. We use the # bootstrap to derive 0.95 confidence intervals for the ranks of all # the effects in the model. We use the plot method for anova, with # pl=FALSE to suppress actual plotting of chi-square - d.f. for each # bootstrap repetition. We rank the negative of the adjusted # chi-squares so that a rank of 1 is assigned to the highest. # It is important to tell plot.anova.rms not to sort the results, # or every bootstrap replication would have ranks of 1,2,3 for the stats. mydata <- data.frame(x1=runif(200), x2=runif(200), sex=factor(sample(c('female','male'),200,TRUE))) set.seed(9) # so can reproduce example mydata$y <- ifelse(runif(200)<=plogis(mydata$x1-.5 + .5*(mydata$x2-.5) + .5*(mydata$sex=='male')),1,0) require(boot) b <- boot(mydata, function(data, i, ...) rank(-plot(anova( lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data,subset=i)), sort='none', pl=FALSE)), R=25) # should really do R=500 but will take a while Rank <- b$t0 lim <- t(apply(b$t, 2, quantile, probs=c(.025,.975))) # Use the Hmisc Dotplot function to display ranks and their confidence # intervals. Sort the categories by descending adj. chi-square, for ranks original.chisq <- plot(anova(lrm(y ~ rcs(x1,4)+pol(x2,2)+sex,data=mydata)), sort='none', pl=FALSE) predictor <- as.factor(names(original.chisq)) predictor <- reorder.factor(predictor, -original.chisq) Dotplot(predictor ~ Cbind(Rank, lim), pch=3, xlab='Rank', main=if(.R.) expression(paste( 'Ranks and 0.95 Confidence Limits for ',chi^2,' - d.f.')) else 'Ranks and 0.95 Confidence Limits for Chi-square - d.f.')