bj
BuckleyJames Multiple Regression Model
bj
fits the BuckleyJames distributionfree least squares multiple
regression model to a possibly rightcensored response variable.
This model reduces to ordinary least squares if
there is no censoring. By default, model fitting is done after
taking logs of the response variable.
bj
uses the rms
class
for automatic anova
, fastbw
, validate
, Function
, nomogram
,
summary
, plot
, bootcov
, and other functions. The bootcov
function may be worth using with bj
fits, as the properties of the
BuckleyJames covariance matrix estimator are not fully known for
strange censoring patterns.
The residuals.bj
function exists mainly to compute
residuals and to censor them (i.e., return them as
Surv
objects) just as the original
failure time variable was censored. These residuals are useful for
checking to see if the model also satisfies certain distributional assumptions.
To get these residuals, the fit must have specified y=TRUE
.
The bjplot
function is a special plotting function for objects
created by bj
with x=TRUE, y=TRUE
in effect. It produces three
scatterplots for every covariate in the model: the first plots the
original situation, where censored data are distingushed from
noncensored data by a different plotting symbol. In the second plot,
called a renovated plot, vertical lines show how censored data were
changed by the procedure, and the third is equal to the second, but
without vertical lines. Imputed data are again distinguished from the
noncensored by a different symbol.
The validate
method for bj
validates the Somers' Dxy
rank
correlation between predicted and observed responses, accounting for censoring.
The primary fitting function for bj
is bj.fit
, which does not
allow missing data and expects a full design matrix as input.
Usage
bj(formula=formula(data), data, subset, na.action=na.delete, link="log", control, method='fit', x=FALSE, y=FALSE, time.inc)
"print"(x, digits=4, long=FALSE, coefs=TRUE, latex=FALSE,
title="BuckleyJames Censored Data Regression", ...)
"residuals"(object, type=c("censored","censored.normalized"),...)
bjplot(fit, which=1:dim(X)[[2]])
"validate"(fit, method="boot", B=40, bw=FALSE,rule="aic",type="residual",sls=.05,aics=0, force=NULL, estimates=TRUE, pr=FALSE,
tol=1e7, rel.tolerance=1e3, maxiter=15, ...)
bj.fit(x, y, control)
Arguments
 formula

an S statistical model formula. Interactions up to third order are
supported. The left hand side must be a
Surv
object.  data,subset,na.action
 the usual statistical model fitting arguments
 fit

a fit created by
bj
, required for all functions exceptbj
.  x

a design matrix with or without a first column of ones, to pass
to
bj.fit
. All models will have an intercept. Forprint.bj
is a result ofbj
. Forbj
, setx=TRUE
to include the design matrix in the fit object.  y

a
Surv
object to pass tobj.fit
as the twocolumn response variable. Only right censoring is allowed, and there need not be any censoring. Forbj
, sety
toTRUE
to include the twocolumn response matrix, with the event/censoring indicator in the second column. The first column will be transformed according tolink
, and depending onna.action
, rows with missing data in the predictors or the response will be deleted.  link

set to, for example,
"log"
(the default) to model the log of the response, or"identity"
to model the untransformed response.  control

a list containing any or all of the following components:
iter.max
(maximum number of iterations allowed, default is 20),eps
(convergence criterion: concergence is assumed when the ratio of sum of squared errors from one iteration to the next is between 1eps
and 1+eps
),trace
(set toTRUE
to monitor iterations),tol
(matrix singularity criterion, default is 1e7), and 'max.cycle' (in case of nonconvergence the program looks for a cycle that repeats itself, default is 30).  method

set to
"model.frame"
or"model.matrix"
to return one of those objects rather than the model fit.  time.inc

setting for default time spacing.
Default is 30 if time variable has
units="Day"
, 1 otherwise, unless maximum followup time $< 1$. Then max time/10 is used astime.inc
. Iftime.inc
is not given and max time/defaulttime.inc
is $> 25$,time.inc
is increased.  digits
 number of significant digits to print if not 4.
 long

set to
TRUE
to print the correlation matrix for parameter estimates  coefs
 specify
coefs=FALSE
to suppress printing the table of model coefficients, standard errors, etc. Specifycoefs=n
to print only the firstn
regression coefficients in the model.  latex
 a logical value indicating whether information should be formatted as plain text or as LaTeX markup
 title
 a character string title to be passed to
prModFit
 object
 the result of
bj
 type

type of residual desired. Default is censored unnormalized residuals,
defined as link(Y)  linear.predictors, where the
link function was usually the log function. You can specify
type="censored.normalized"
to divide the residuals by the estimate ofsigma
.  which

vector of integers or character strings naming elements of the design
matrix (the names of the original predictors if they entered the model
linearly) for which to have
bjplot
make plots of only the variables listed inwhich
(names or numbers).  B,bw,rule,sls,aics,force,estimates,pr,tol,rel.tolerance,maxiter
 see
predab.resample
 ...

ignored for
print
; passed through topredab.resample
forvalidate
Details
The program implements the algorithm as described in the original article by Buckley & James. Also, we have used the original Buckley & James prescription for computing variance/covariance estimator. This is based on noncensored observations only and does not have any theoretical justification, but has been shown in simulation studies to behave well. Our experience confirms this view. Convergence is rather slow with this method, so you may want to increase the number of iterations. Our experience shows that often, in particular with high censoring, 100 iterations is not too many. Sometimes the method will not converge, but will instead enter a loop of repeating values (this is due to the discrete nature of Kaplan and Meier estimator and usually happens with small sample sizes). The program will look for such a loop and return the average betas. It will also issue a warning message and give the size of the cycle (usually less than 6).
Value
bj
returns a fit object with similar information to what survreg
,
psm
, cph
would store as
well as what rms
stores and units
and time.inc
.
residuals.bj
returns a Surv
object. One of the components of the
fit
object produced by bj
(and bj.fit
) is a vector called
stats
which contains the following names elements:
"Obs", "Events", "d.f.","error d.f.","sigma","g"
. Here
sigma
is the estimate of the residual standard deviation.
g
is the $g$index. If the link function is "log"
,
the $g$index on the antilog scale is also returned as gr
.
References
Buckley JJ, James IR. Linear regression with censored data. Biometrika 1979; 66:42936.
Miller RG, Halpern J. Regression with censored data. Biometrika 1982; 69: 52131.
James IR, Smith PJ. Consistency results for linear regression with censored data. Ann Statist 1984; 12: 590600.
Lai TL, Ying Z. Large sample theory of a modified BuckleyJames estimator for regression analysis with censored data. Ann Statist 1991; 19: 1370402.
Hillis SL. Residual plots for the censored data linear regression model. Stat in Med 1995; 14: 20232036.
Jin Z, Lin DY, Ying Z. On leastsquares regression with censored data. Biometrika 2006; 93:147161.
See Also
rms
, psm
, survreg
,
cph
, Surv
,
na.delete
,
na.detail.response
, datadist
,
rcorr.cens
, GiniMd
,
prModFit
, dxy.cens
Examples
set.seed(1)
ftime < 10*rexp(200)
stroke < ifelse(ftime > 10, 0, 1)
ftime < pmin(ftime, 10)
units(ftime) < "Month"
age < rnorm(200, 70, 10)
hospital < factor(sample(c('a','b'),200,TRUE))
dd < datadist(age, hospital)
options(datadist="dd")
f < bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, x=TRUE, y=TRUE)
# add link="identity" to use a censored normal regression model instead
# of a lognormal one
anova(f)
fastbw(f)
validate(f, B=15)
plot(Predict(f, age, hospital))
# needs datadist since no explicit age,hosp.
coef(f) # look at regression coefficients
coef(psm(Surv(ftime, stroke) ~ rcs(age,5) + hospital, dist='lognormal'))
# compare with coefficients from likelihoodbased
# lognormal regression model
# use dist='gau' not under R
r < resid(f, 'censored.normalized')
survplot(npsurv(r ~ 1), conf='none')
# plot KaplanMeier estimate of
# survival function of standardized residuals
survplot(npsurv(r ~ cut2(age, g=2)), conf='none')
# may desire both strata to be n(0,1)
options(datadist=NULL)