tsrq: Quantile Regression Transformation Models

Description

These functions are used to fit quantile regression transformation models.

Usage

tsrq(formula, tsf = "mcjI", symm = TRUE, dbounded = FALSE, lambda = NULL,
	conditional = FALSE, tau = 0.5, data, subset, weights, na.action,
	method = "fn", ...)
tsrq2(formula, dbounded = FALSE, lambda = NULL, delta = NULL, conditional = FALSE,
	tau = 0.5, data, subset, weights, na.action, method = "fn", ...)
rcrq(formula, tsf = "mcjI", symm = TRUE, dbounded = FALSE, lambda = NULL,
	tau = 0.5, data, subset, weights, na.action, method = "fn", ...)
nlrq2(formula, par = NULL, dbounded = FALSE, tau = 0.5, data,
	subset, weights, na.action, ...)

Arguments

Value

tsrq, tsrq2, rcrq, nlrq2 return an object of class rqt. This is a list that contains as typical components:the first nt = length(tau) elements of the list store the results from fitting linear quantile models on the tranformed scale of the response.callthe matched call.methodthe fitting algorithm for rq or optim.ythe response -- untransformed scale.thetaif dbounded = TRUE, the response mapped to the unit interval.xthe model matrix.weightsthe weights used in the fitting process (a vector of 1's if weights is missing or NULL).tauthe order of the estimated quantile(s).lambdathe estimated parameter lambda.etathe estimated parameters lambda and delta in the two-parameter Proposal II tranformation.lambda.gridgrid of lambda values used for estimation.delta.gridgrid of delta values used for estimation.tsftranformation used (see also attributes(tsf)).objectivevalues of the objective function minimised over the tranformation parameter(s). This is an array of dimension c(nl,nt) or c(nl,nd,nt), where nl = length(lambda.grid), nd = length(delta.grid) and nt = length(tau).optimumvalue of the objective function at solution.coefficientsquantile regression coefficients -- transformed scale.fitted.valuesfitted values.rejectedproportion of inadmissible observations (Fitzenberger et al, 2010).termsthe terms used.term.labelsnames of coefficients.rdfresidual degrees of freedom.

Details

These functions implement quantile regression transformation models as discussed by Geraci and Jones (see references). The general model is assumed to be Q(h(response)) = xb, where Q is the conditional quantile function and h is a one- or two-parameter transformation. A typical model specified in formula has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for the quantile of the transformed response. The response, which is singly or doubly bounded, i.e. response > 0 or

0 <= response="" <="1 respectively, undergoes the transformation specified in tsf. If the response is bounded in the generic [a,b] interval, the latter is automatically mapped to [0,1] and no further action is required. If, however, the response is singly bounded and contains negative values, it is left to the user to offset the response or the code will produce an error. The functions tsrq and tsrq2 use a two-stage (TS) estimator (Fitzenberger et al, 2010) for, respectively, one- and two-parameter transformations. The function rcrq (one-parameter tranformations) is based on the residual cusum process estimator proposed by Mu and He (2007), while the function nlrq2 (two-parameter tranformations) is based on Nelder-Mead optimization (Geraci and Jones).

References

Aranda-Ordaz FJ. On two families of transformations to additivity for binary response data. Biometrika 1981;68(2):357-363. Box GEP, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society Series B-Statistical Methodology 1964;26(2):211-252. Dehbi H-M, Cortina-Borja M, Geraci M. Aranda-Ordaz quantile regression for student performance assessment. Journal of Applied Statistics 2015. http://dx.doi.org/10.1080/02664763.2015.1025724 Fitzenberger B, Wilke R, Zhang X. Implementing Box-Cox quantile regression. Econometric Reviews 2010;29(2):158-181. Geraci M and Jones MC. Improved transformation-based quantile regression. Canadian Journal of Statistics 2015;43(1):118-132. Jones MC. Connecting distributions with power tails on the real line, the half line and the interval. International Statistical Review 2007;75(1):58-69. Koenker R (2013). quantreg: Quantile Regression. R package version 5.05. URL http://CRAN.R-project.org/package=quantreg. Mu YM, He XM. Power transformation toward a linear regression quantile. Journal of the American Statistical Association 2007;102(477):269-279.

Examples

Run this code

###########################################################
## Example 1 - singly bounded (from Geraci and Jones, 2014)

data(trees)

require(MASS)

dx <- 0.01

lambda0 <- boxcox(Volume ~ log(Height), data = trees,
	lambda = seq(-0.9, 0.5, by = dx))
lambda0 <- lambda0$x[which.max(lambda0$y)]
trees$z <- bc(trees$Volume,lambda0)
trees$y <- trees$Volume
trees$x <- log(trees$Height)
trees$x <- trees$x - mean(log(trees$Height))

fit.lm <- lm(z ~ x, data = trees)
newd <- data.frame(x = log(seq(min(trees$Height),
	max(trees$Height), by = 0.1)))
newd$x <- newd$x - mean(log(trees$Height))
ylm <- invbc(predict(fit.lm, newdata = newd), lambda0)

lambdas <- list(bc = seq(-10, 10, by=dx),
	mcjIs = seq(0,10,by = dx), mcjIa = seq(0,20,by = dx))

taus <- 1:3/4
fit0 <- tsrq(y ~ x, tsf = "bc", symm = FALSE, data = trees,
	lambda = lambdas$bc, tau = taus)
fit1 <- tsrq(y ~ x, tsf = "mcjI", symm = TRUE, dbounded = FALSE,
	data = trees, lambda = lambdas$mcjIs, tau = taus)
fit2 <- tsrq(y ~ x, tsf = "mcjI", symm = FALSE, dbounded = FALSE,
	data = trees, lambda = lambdas$mcjIa, tau = taus)


par(mfrow = c(1,3), mar = c(7.1, 7.1, 5.1, 2.1), mgp = c(5, 2, 0))

cx.lab <- 2.5
cx.ax <- 2
lw <- 2
cx <- 2
xb <- "log(Height)"
yb <- "Volume"
xl <- range(trees$x)
yl <- c(5,80)

yhat <- predict(fit0, newdata = newd)
plot(y ~ x,data = trees, xlim = xl, ylim = yl, main = "Box-Cox",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)

yhat <- predict(fit1, newdata = newd)
plot(y ~ x,data = trees, xlim = xl, ylim = yl, main = "Proposal I (symmetric)",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)

yhat <- predict(fit2, newdata = newd)
plot(y ~ x, data = trees, xlim = xl, ylim = yl, main = "Proposal I (asymmetric)",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)

###########################################################
## Example 2 - doubly bounded

data(Chemistry)

Chemistry$gcse_gr <- cut(Chemistry$gcse, c(0,seq(4,8,by=0.5)))
with(Chemistry, plot(score ~ gcse_gr, xlab = "GCSE score",
	ylab = "A-level Chemistry score"))

# The dataset has > 31000 observations and computation can be slow
set.seed(178)
chemsub <- Chemistry[sample(1:nrow(Chemistry), 2000), ]

# Fit symmetric Aranda-Ordaz quantile 0.9
tsrq(score ~ gcse, tsf = "ao", symm = TRUE, lambda = seq(0,2,by=0.01),
	tau = 0.9, data = chemsub)

# Fit symmetric Proposal I quantile 0.9
tsrq(score ~ gcse, tsf = "mcjI", symm = TRUE, dbounded = TRUE,
	lambda = seq(0,2,by=0.01), tau = 0.9, data = chemsub)

# Fit Proposal II quantile 0.9 (Nelder-Mead)
nlrq2(score ~ gcse, dbounded = TRUE, tau = 0.9, data = chemsub)

# Fit Proposal II quantile 0.9 (grid search)
# This is slower than nlrq2 but more stable numerically
tsrq2(score ~ gcse, dbounded = TRUE, lambda = seq(0, 2, by = 0.1),
	delta = seq(0, 2, by = 0.1), tau = 0.9, data = chemsub)

###########################################################
## Example 3 - doubly bounded

data(labor)

new <- labor
new$y <- new$pain
new$x <- (new$time-30)/30
new$x_gr <- as.factor(new$x)

par(mfrow = c(2,2), mai = c(1.3,1.3,0.5,0.5), mgp = c(5, 2, 0))

cx.lab <- 2.5
cx.ax <- 2.5
cx <- 2.5
yl <- c(0,0.06)

hist(new$y[new$treatment == 1], xlab = "Pain score", main = "Treated",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab, axes = TRUE,
	freq = FALSE, ylim = yl)

plot(y ~ x_gr, new, subset = new$treatment == 1, xlab = "Time (min)",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax,
	cex.main = cx.lab, axes = FALSE, range = 0)
axis(1, at = 1:6, labels = c(0:5)*30 + 30, cex.axis = 2)
axis(2, cex.axis = cx.lab)
box()

hist(new$y[new$treatment == 0], xlab = "Pain score", main = "Placebo",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = TRUE, freq = FALSE, ylim = yl)

plot(y ~ x_gr, new, subset = new$treatment == 0, xlab = "Time (min)",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax,
	cex.main = cx.lab, axes = FALSE, range = 0)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = 2)
axis(2, cex.axis = cx.lab)
box()

#

taus <- c(1:3/4)
ls <- seq(0,3.5,by=0.1)

fit.aos <- tsrq(y ~ x*treatment, tsf = "ao", symm = TRUE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.aoa <- tsrq(y ~ x*treatment, tsf = "ao", symm = FALSE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.mcjs <- tsrq(y ~ x*treatment, tsf = "mcjI", symm = TRUE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.mcja <- tsrq(y ~ x*treatment, tsf = "mcjI", symm = FALSE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.mcj2 <- tsrq2(y ~ x*treatment, dbounded = TRUE, data = new, tau = taus,
	lambda = seq(0,2,by=0.1), delta = seq(0,1.5,by=0.3))
fit.nlrq <- nlrq2(y ~ x*treatment, par = coef(fit.mcj2, all = TRUE)[,1],
	dbounded = TRUE, data = new, tau = taus)

sel <- 0
selx <- if(sel == 1) "a" else "b"
x <- new$x
nd <- data.frame(x = seq(min(x),max(x),length=200), treatment = sel)
xx <- nd$x+1

par(mfrow = c(2,2), mar = c(7.1, 7.1, 5.1, 2.1), mgp = c(5, 2, 0))
cx.lab <- 3
cx.ax <- 2
lw <- 3
cx <- 1

fit <- fit.aos
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = FALSE, main = "Aranda-Ordaz (s)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

fit <- fit.aoa
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "", ylab = "",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab, axes = FALSE,
	main = "Aranda-Ordaz (a)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

fit <- fit.mcjs
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "Time (min)",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = FALSE, main = "Proposal I (s)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

fit <- fit.mcj2
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "Time (min)",
	ylab = "", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = FALSE, main = "Proposal II", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

Run the code above in your browser using DataLab