Y
. This is
important to note for the asymmetric distributions given by the
log-log and complementary log-log families, for which negating the
linear predictor does not result in $Prob[Y < y | X]$. The
family
argument is defined in orm.fit
. The model
assumes that the inverse of the assumed cumulative distribution
function, when applied to one minus the true cumulative distribution function
and plotted on the $y$-axis (with the original $y$ on the
$x$-axis) yields parallel curves (though not necessarily linear).
This can be checked by plotting the inverse cumulative probability
function of one minus the empirical distribution function, stratified
by X
, and assessing parallelism. Note that parametric
regression models make the much stronger assumption of linearity of
such inverse functions. Quantile.orm
creates an R function that computes an estimate of
a given quantile for a given value of the linear predictor (using the
first intercept). It uses approximate linear interpolation on the
original response variable scale (the interpolation is exactly linear
of there is only one unique value of the linear predictor given as the
second argument).
orm(formula, data, subset, na.action=na.delete, method="orm.fit",
model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE,
penalty=0, penalty.matrix, tol=1e-7, eps=0.005,
var.penalty=c('simple','sandwich'), ...)## S3 method for class 'orm':
print(x, digits=4, coefs=TRUE,
intercepts=x$non.slopes < 10, latex=FALSE, title, \dots)
## S3 method for class 'orm':
Quantile(object, codes=FALSE, \dots)
offset
term can be included. The offset causes
fitting of a model such as $logit(Y=1) = X\beta + W$, where $W$ is the
offset variable having no estimated coefficient.
The response variable can be any data type; orm<
NA
s in the data. Default is na.delete
, which
deletes any observation having response or predictor missing, while
preserving the attributes of the predictors and maintaining frequencies
of deletions due to each orm.fit
.x
. For print
, an object
created by orm
.y
.linear.predictors
. The first intercept is used.se.fit
. The middle
intercept is used.lrm
lrm
orm.fit
)lrm
orm.fit
, or from
print
, to prModFit
. Ignored for
Quantile
. One of the most important arguments is family
.coefs=FALSE
to suppress printing the table
of model coefficients, standard errors, etc. Specify coefs=n
to print only the first n
regression coefficients in the
model.intercepts=FALSE
or TRUE
.prModFit
.
Default is constructed from the name of the distribution family.orm
TRUE
, uses the integer codes $1,2,\ldots,k$
for the $k$-level response in computing the predicted quantileorm
contains the following components
in addition to the ones mentioned under the optional arguments.Y
in order of increasing Y
Y
values, median Y
from among the
observations used int he fit, maximum absolute value of first
derivative of log likelihood, model likelihood ratio
$\chi^2$, d.f., $P$-value, score $\chi^2$
statistic (if no initial values given), $P$-value, Spearman's
$\rho$ rank correlation between the linear predictor and Y
,
the Nagelkerke $R^2$ index, the $g$-index, $gr$ (the
$g$-index on the odds ratio scale), and $pdm$ (the mean absolute
difference between 0.5 and the predicted probability that $Y\geq$
the marginal median).
In the case of penalized estimation, the "Model L.R."
is computed
without the penalty factor, and "d.f."
is the effective d.f. from
Gray's (1992) Equation 2.9. The $P$-value uses this corrected model
L.R. $\chi^2$ and corrected d.f.
The score chi-square statistic uses first derivatives which contain
penalty components.TRUE
if convergence failed (and maxiter>1
) or if a
singular information matrix is encounteredlrm
for details if penalization is used.lrm
family
. If family
was a user-customized list, it must have had an element named
name
, which is taken as the return value for family
here.family
, with
elements cumprob
(the cumulative probability distribution
function), inverse
(inverse of cumprob
), deriv
(first derivative of cumprob
), and deriv2
(second
derivative of cumprob
)var
lrm
matrix.csr
from the SparseM
package. This allows the
full information matrix with all intercepts to be stored efficiently,
and matrix operations using the Cholesky decomposition to be fast.
link{vcov.orm}
uses this information to compute the covariance
matrix for intercepts other than the middle one.Verweij PJM, Van Houwelingen JC: Penalized likelihood in Cox regression. Stat in Med 13:2427--2436, 1994.
Gray RJ: Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA 87:942--951, 1992.
Shao J: Linear model selection by cross-validation. JASA 88:486--494, 1993.
Verweij PJM, Van Houwelingen JC: Crossvalidation in survival analysis. Stat in Med 12:2305--2314, 1993.
Harrell FE: Model uncertainty, penalization, and parsimony. Available from http://biostat.mc.vanderbilt.edu/FHHandouts.
orm.fit
, predict.orm
, solve
,
rms.trans
, rms
, polr
,
latex.orm
, vcov.orm
,
num.intercepts
,
residuals.orm
, na.delete
,
na.detail.response
,
pentrace
, rmsMisc
, vif
,
predab.resample
,
validate.orm
, calibrate
,
Mean.orm
, gIndex
, prModFit
set.seed(1)
n <- 100
y <- round(runif(n), 2)
x1 <- sample(c(-1,0,1), n, TRUE)
x2 <- sample(c(-1,0,1), n, TRUE)
f <- lrm(y ~ x1 + x2, eps=1e-5)
g <- orm(y ~ x1 + x2, eps=1e-5)
max(abs(coef(g) - coef(f)))
w <- vcov(g, intercepts='all') / vcov(f) - 1
max(abs(w))
set.seed(1)
n <- 300
x1 <- c(rep(0,150), rep(1,150))
y <- rnorm(n) + 3*x1
g <- orm(y ~ x1)
g
k <- coef(g)
i <- num.intercepts(g)
h <- orm(y ~ x1, family=probit)
ll <- orm(y ~ x1, family=loglog)
cll <- orm(y ~ x1, family=cloglog)
cau <- orm(y ~ x1, family=cauchit)
x <- 1:i
z <- list(logistic=list(x=x, y=coef(g)[1:i]),
probit =list(x=x, y=coef(h)[1:i]),
loglog =list(x=x, y=coef(ll)[1:i]),
cloglog =list(x=x, y=coef(cll)[1:i]))
labcurve(z, pl=TRUE, col=1:4, ylab='Intercept')
tapply(y, x1, mean)
m <- Mean(g)
m(w <- k[1] + k['x1']*c(0,1))
mh <- Mean(h)
wh <- coef(h)[1] + coef(h)['x1']*c(0,1)
mh(wh)
qu <- Quantile(g)
# Compare model estimated and empirical quantiles
cq <- function(y) {
cat(qu(.1, w), tapply(y, x1, quantile, probs=.1), '<n>')
cat(qu(.5, w), tapply(y, x1, quantile, probs=.5), '<n>')
cat(qu(.9, w), tapply(y, x1, quantile, probs=.9), '<n>')</n>
cq(y)
# Try on log-normal model
g <- orm(exp(y) ~ x1)
g
k <- coef(g)
plot(k[1:i])
m <- Mean(g)
m(w <- k[1] + k['x1']*c(0,1))
tapply(exp(y), x1, mean)
qu <- Quantile(g)
cq(exp(y))
# Compare predicted mean with ols for a continuous x
set.seed(3)
n <- 200
x1 <- rnorm(n)
y <- x1 + rnorm(n)
dd <- datadist(x1); options(datadist='dd')
f <- ols(y ~ x1)
g <- orm(y ~ x1, family=probit)
h <- orm(y ~ x1, family=logistic)
w <- orm(y ~ x1, family=cloglog)
mg <- Mean(g); mh <- Mean(h); mw <- Mean(w)
r <- rbind(ols = Predict(f, conf.int=FALSE),
probit = Predict(g, conf.int=FALSE, fun=mg),
logistic = Predict(h, conf.int=FALSE, fun=mh),
cloglog = Predict(w, conf.int=FALSE, fun=mw))
plot(r, groups='.set.')
# Compare predicted 0.8 quantile with quantile regression
qu <- Quantile(g)
qu80 <- function(lp) qu(.8, lp)
f <- Rq(y ~ x1, tau=.8)
r <- rbind(probit = Predict(g, conf.int=FALSE, fun=qu80),
quantreg = Predict(f, conf.int=FALSE))
plot(r, groups='.set.')
# Verify transformation invariance of ordinal regression
ga <- orm(exp(y) ~ x1, family=probit)
qua <- Quantile(ga)
qua80 <- function(lp) log(qua(.8, lp))
r <- rbind(logprobit = Predict(ga, conf.int=FALSE, fun=qua80),
probit = Predict(g, conf.int=FALSE, fun=qu80))
plot(r, groups='.set.')
# Try the same with quantile regression. Need to transform x1
fa <- Rq(exp(y) ~ rcs(x1,5), tau=.8)
r <- rbind(qr = Predict(f, conf.int=FALSE),
logqr = Predict(fa, conf.int=FALSE, fun=log))
plot(r, groups='.set.')
options(datadist=NULL)
## Simulate power and type I error for orm logistic and probit regression
## for likelihood ratio, Wald, and score chi-square tests, and compare
## with t-test
require(rms)
set.seed(5)
nsim <- 2000
r <- NULL
for(beta in c(0, .4)) {
for(n in c(10, 50, 300)) {
cat('beta=', beta, ' n=', n, '\n\n')
plogistic <- pprobit <- plogistics <- pprobits <- plogisticw <-
pprobitw <- ptt <- numeric(nsim)
x <- c(rep(0, n/2), rep(1, n/2))
pb <- setPb(nsim, every=25, label=paste('beta=', beta, ' n=', n))
for(j in 1:nsim) {
pb(j)
y <- beta*x + rnorm(n)
tt <- t.test(y ~ x)
ptt[j] <- tt$p.value
f <- orm(y ~ x)
plogistic[j] <- f$stats['P']
plogistics[j] <- f$stats['Score P']
plogisticw[j] <- 1 - pchisq(coef(f)['x']^2 / vcov(f)[2,2], 1)
f <- orm(y ~ x, family=probit)
pprobit[j] <- f$stats['P']
pprobits[j] <- f$stats['Score P']
pprobitw[j] <- 1 - pchisq(coef(f)['x']^2 / vcov(f)[2,2], 1)
}
if(beta == 0) plot(ecdf(plogistic))
r <- rbind(r, data.frame(beta = beta, n=n,
ttest = mean(ptt < 0.05),
logisticlr = mean(plogistic < 0.05),
logisticscore= mean(plogistics < 0.05),
logisticwald = mean(plogisticw < 0.05),
probit = mean(pprobit < 0.05),
probitscore = mean(pprobits < 0.05),
probitwald = mean(pprobitw < 0.05)))
}
}
print(r)
# beta n ttest logisticlr logisticscore logisticwald probit probitscore probitwald
#1 0.0 10 0.0435 0.1060 0.0655 0.043 0.0920 0.0920 0.0820
#2 0.0 50 0.0515 0.0635 0.0615 0.060 0.0620 0.0620 0.0620
#3 0.0 300 0.0595 0.0595 0.0590 0.059 0.0605 0.0605 0.0605
#4 0.4 10 0.0755 0.1595 0.1070 0.074 0.1430 0.1430 0.1285
#5 0.4 50 0.2950 0.2960 0.2935 0.288 0.3120 0.3120 0.3120
#6 0.4 300 0.9240 0.9215 0.9205 0.920 0.9230 0.9230 0.9230</n>
<keyword>category</keyword>
<keyword>models</keyword>
<concept>logistic regression model</concept>
<concept>ordinal logistic model</concept>
<concept>proportional odds model</concept>
<concept>ordinal response</concept></n>
Run the code above in your browser using DataLab