crq: Functions to fit censored quantile regression models

Description

Fits a conditional quantile regression model for censored data. There are three distinct methods: the first is the fixed censoring method of Powell (1986) as implemented by Fitzenberger (1996), the second is the random censoring method of Portnoy (2003). The third method is based on Peng and Huang (2008).

Usage

crq(formula, taus, data, subset, weights, na.action, method = c("Powell", "Portnoy", "PengHuang"), contrasts = NULL, ...)
crq.fit.pow(x, y, yc, tau=0.5, weights=NULL, start, left=TRUE, maxit = 500)
crq.fit.pen(x, y, cen, weights=NULL, grid) 
crq.fit.por(x, y, cen, weights=NULL, grid) 
Curv(y, yc, type=c("left","right"))
## S3 method for class 'crq':
print(x, ...)
## S3 method for class 'crq':
print(x, ...)
## S3 method for class 'crq':
predict(object, newdata,  ...)
## S3 method for class 'crqs':
predict(object, newdata, type = NULL, ...)
## S3 method for class 'crq':
coef(object,taus = 1:4/5,...)

Arguments

formula

A formula object, with the response on the left of the `~' operator, and the terms on the right. The response must be a Surv object as returned by either the Curv or Surv function. For the Powell m

The event time.

newdata

An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

grid

A vector of taus on which the quantile process should be evaluated. This should be monotonic, and take values in (0,1). For the "Portnoy" method, grid = "pivot" computes the full solution for all distinct taus.

An object of class crq or crq.

object

An object of class crq or crq.

The censoring times for the "Powell" method.

type

Censoring type for the "Powell" method, used in Curv, by default "left". If you don't like "left", maybe you will like "right". Note that for fixed censoring assumed in the "Powell" method, censoring times yc must be provide

cen

The censoring indicator for the "Portnoy" and "PengHuang" methods.

maxit

Maximum number of iterations allowed for the "Powell" methods.

start

The starting value for the coefs for the "Powell" method. Because the Fitzenberger algorithm stops when it achieves a local minimum of the Powell objective function, the starting value acts as an a priori "preferred point". This is advantageous in so

left

A logical indicator for left censoring for the "Powell" method.

taus

The quantile(s) at which the model is to be estimated.

tau

The quantile at which the model is to be estimated.

data

A data.frame in which to interpret the variables named in the `formula', in the `subset', and the `weights' argument.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

vector of observation weights; if supplied, the algorithm fits to minimize the sum of the weights multiplied into the absolute residuals. The length of weights vector must be the same as the number of observations. The weigh

na.action

a function to filter missing data. This is applied to the model.frame after any subset argument has been used. The default (with 'na.fail') is to create an error if any missing values are found. A possible alternative is

method

The method used for fitting. There are currently two options: method "Powell" computes the Powell estimator using the algorithm of Fitzenberger (1996), method "Portnoy" computes the Portnoy (2003) estimator. The method is "PengHuang" uses the method

contrasts

a list giving contrasts for some or all of the factors default = 'NULL' appearing in the model formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (s

...

additional arguments for the fitting routine, for method "Powell" it may be useful to pass starting values of the regression parameter via the argument "start", while for methods "Portnoy" or "PengHuang" one may wish to specify an alternative to the de

Value

An object of class crq.

Details

The Fitzenberger algorithm uses a variant of the Barrodale and Roberts simplex method. Exploiting the fact that the solution must be characterized by an exact fit to p points when there are p parameters to be estimated, at any trial basic solution it computes the directional derivatives in the 2p distinct directions and choses the direction that (locally) gives steepest descent. It then performs a one-dimensional line search to choose the new basic observation and continues until it reaches a local mimumum. By default it starts at the naive rq solution ignoring the censoring; this has the (slight) advantage that the estimator is consequently equivariant to canonical transformations of the data. Since the objective function is no longer convex there can be no guarantee that this produces a global minimum estimate. In small problems exhaustive search over solutions defined by p-element subsets of the n observations can be used, but this quickly becomes impractical for large p and n. This global version of the Powell estimator can be invoked by specifying start = "global". Users interested in this option would be well advised to compute choose(n,p) for their problems before trying it. The method operates by pivoting through this many distinct solutions and choosing the one that gives the minimal Powell objective. The algorithm used for the Portnoy method is described in considerable detail in Portnoy (2003). Both the Portnoy and Peng-Huang estimators may be unable to compute estimates of the conditional quantile parameters in the upper tail of distribution. Like the Kaplan-Meier estimator, when censoring is heavy in the upper tail the estimated distribution is defective and quantiles are only estimable on a sub-interval of (0,1). The Peng and Huang estimator can be viewed as a generalization of the Nelson Aalen estimator of the cumulative hazard function, and can be formulated as a variant of the conventional quantile regression dual problem. See Koenker (2008) for further details. This paper is available from the package with vignette("crq").

References

Fitzenberger, B. (1996): ``A Guide to Censored Quantile Regressions,'' in Handbook of Statistics, ed. by C.~Rao, and G.~Maddala. North-Holland: New York.

Fitzenberger, B. and P. Winker (2007): ``Improving the Computation of Censored Quantile Regression Estimators,'' CSDA, 52, 88-108.

Koenker, R. (2008): ``Censored Quantile Regression Redux,'' J. Statistical Software, 27, http://www.jstatsoft.org/v27/i06.

Peng, L and Y Huang, (2008) Survival Analysis with Quantile Regression Models, J. Am. Stat. Assoc., 103, 637-649.

Portnoy, S. (2003) ``Censored Quantile Regression,'' JASA, 98,1001-1012.

Powell, J. (1986) ``Censored Regression Quantiles,'' J. Econometrics, 32, 143--155.

Examples

Run this code

# An artificial Powell example
set.seed(2345)
x <- sqrt(rnorm(100)^2)
y <-  -0.5 + x +(.25 + .25*x)*rnorm(100)
plot(x,y, type="n")
s <- (y > 0)
points(x[s],y[s],cex=.9,pch=16)
points(x[!s],y[!s],cex=.9,pch=1)
yLatent <- y
y <- pmax(0,y)
yc <- rep(0,100)
for(tau in (1:4)/5){
        f <- crq(Curv(y,yc) ~ x, tau = tau, method = "Pow")
        xs <- sort(x)
        lines(xs,pmax(0,cbind(1,xs)%*%f$coef),col="red")
        abline(rq(y ~ x, tau = tau), col="blue")
        abline(rq(yLatent ~ x, tau = tau), col="green")
        }
legend(.15,2.5,c("Naive QR","Censored QR","Omniscient QR"),
        lty=rep(1,3),col=c("blue","red","green"))
data(uis)
#estimate the Peng and Huang model using log(TIME) AFT specification
fit <- crq(Surv(log(TIME), CENSOR) ~  ND1 + ND2 + IV3 +
               TREAT + FRAC + RACE + AGE * SITE, method = "Por", data = uis)
Sfit <- summary(fit,1:19/20)
PHit <- coxph(Surv(TIME, CENSOR) ~  ND1 + ND2 + IV3 +
               TREAT + FRAC + RACE + AGE * SITE, data = uis)
plot(Sfit, CoxPHit = PHit)
formula <-  ~  ND1 + ND2 + IV3 + TREAT + FRAC + RACE + AGE * SITE -1
X <- data.frame(model.matrix(formula,data=uis))
newd <- as.list(apply(X,2,median))
pred <- predict(fit, newdata=newd, stepfun = TRUE)
plot(pred,do.points=FALSE,xlab = expression(tau), ylab = expression(Q(tau)),main= "Quantiles at Median Covariate Values")
plot(rearrange(pred),add=TRUE,do.points=FALSE,col.vert ="red", col.hor="red")
legend(.15,7,c("Raw","Rearranged"),lty = rep(1,2),col=c("black","red"))

Run the code above in your browser using DataLab