This is a series of functions (asis, pol, lsp,
rcs, catg, scored, strat, matrx,
gTrans, and
%ia%) that set up special attributes (such as
knots and nonlinear term indicators) that are carried through to fits
(using for example lrm,cph, ols,
psm). anova.rms, summary.rms, Predict,
survplot, fastbw, validate, specs,
which.influence, nomogram and latex.rms use these
attributes to automate certain analyses (e.g., automatic tests of linearity
for each predictor are done by anova.rms). Many of the functions
are called implicitly. Some S functions such as ns derive data-dependent
transformations that are not always "remembered" when predicted values are
later computed, so the predictions may be incorrect. The functions listed
here solve that problem when used in the rms context.
asis is the identity transformation, pol is an ordinary
(non-orthogonal) polynomial, rcs is a linear tail-restricted
cubic spline function (natural spline, for which the
rcspline.eval function generates the design matrix, the
presence of system option rcspc causes rcspline.eval to be
invoked with pc=TRUE, and the presence of system option fractied
causes this value to be passed to rcspline.eval as the fractied
argument), catg is for a categorical variable,
scored is for an ordered categorical variable, strat is
for a stratification factor in a Cox model, matrx is for a matrix
predictor, and %ia% represents restricted interactions in which
products involving nonlinear effects on both variables are not included
in the model. asis, catg, scored, matrx are seldom invoked
explicitly by the user (only to specify label or name,
usually).
gTrans is a general multiple-parameter transformation function.
It can be used to specify new polynomial bases, smooth relationships
with a discontinuity at one or more values of x, grouped
categorical variables, e.g., a categorical variable with 5 levels where
you want to combine two of the levels to spend only 3 degrees of freedom in
all but see plots of predicted values where the two combined categories
are kept separate but will have equal effect estimates. The first
argument to gTrans is a regular numeric, character, or factor
variable. The next argument is a function that transforms a vector into
a matrix. If the basis functions are to include a linear term it is up
too the user to include the original x as one of the columns.
Column names are assigned automaticall, but any column names specified
by the user will override the default name. If you want to signal which
terms correspond to linear and which correspond to nonlinear effects for
the purpose of running anova.rms, add an integer vector attribute
nonlinear to the resulting matrix. This vector specifies the
column numbers corresponding to nonlinear effects. The default is to assume a column
is a linear effect. The parms attribute stored with a
gTrans result a character vector version of the function, so as
to not waste space carrying along any environment information. If you
will be using the latex method for typesetting the fitted model,
you must include a tex attribute also in the produced matrix.
This must be a function of a single character string argument (that will
ultimately contain the name of the predictor in LaTeX notation) and must
produce a vector of LaTeX character strings. See
https://hbiostat.org/R/examples/gTrans/gTrans.html for several examples of the
use of gTrans including the use of nonlinear and
tex.
A makepredictcall method is defined so that usage of the
transformation functions outside of rms fitting functions will
work for getting predicted values. Thanks to Therry Therneau for the code.
In the list below, functions asis through gTrans can have
arguments x, parms, label, name except that parms does not
apply to asis, matrx, strat.
asis(...)
matrx(...)
pol(...)
lsp(...)
rcs(...)
catg(...)
scored(...)
strat(...)
gTrans(...)
x1 %ia% x2
# S3 method for rms
makepredictcall(var, call)
The arguments ... above contain the following.
xa predictor variable (or a function of one). If
you specify e.g. pol(pmin(age,10),3), a cubic polynomial
will be fitted in pmin(age,10) (pmin is the S vector
element--by--element function). The predictor will be labeled
age in the output, and plots with have age in its
original units on the axes. If you use a function such as
pmin, the predictor is taken as the first argument, and
other arguments must be defined in the frame in effect when
predicted values, etc., are computed.
parmsparameters of transformation (e.g. number or
location of knots). For pol the argument is the order of
the polynomial, e.g. 2 for quadratic (the usual
default). For lsp it is a vector of knot locations
(lsp will not estimate knot locations). For rcs it
is the number of knots (if scalar), or vector of knot locations
(if >2 elements). The default number is the nknots
system option if parms is not given. If the number of
knots is given, locations are computed for that number of knots.
If system option rcspc is TRUE the parms
vector has an attribute defining the principal components
transformation parameters. For catg, parms is the
category labels (not needed if variable is an S category or factor
variable). If omitted, catg will use unique(x), or
levels(x) if x is a category or a
factor. For scored, parms is a vector of
unique values of variable (uses unique(x) by default).
This is not needed if x is an S ordered variable.
For strat, parms is the category labels (not needed
if variable is an S category variable). If omitted, will use
unique(x), or levels(x) if x is
category or factor. parms is not used for
matrix.
labellabel of predictor for plotting (default =
"label" attribute or variable name)
nameName to use for predictor in model. Default is name of argument to function.
two continuous variables for which to form a non-doubly-nonlinear interaction
a model term passed from a (usually non-rms) function
call object for a model term
Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com
rcspline.eval,
rcspline.restate, rms,
cph, lrm, ols,
datadist, makepredictcall