tram: Stratified Linear Transformation Models

Description

Likelihood-inference for stratified linear transformation models

Usage

tram(formula, data, subset, weights, offset, cluster, na.action = na.omit, 
     distribution = c("Normal", "Logistic", "MinExtrVal", "MaxExtrVal", "Exponential"), 
     transformation = c("discrete", "linear", "logarithmic", "smooth"), 
     LRtest = TRUE, prob = c(0.1, 0.9), support = NULL, 
     bounds = NULL, add = c(0, 0), order = 6, 
     negative = TRUE, scale = TRUE, extrapolate = FALSE, 
     log_first = FALSE, sparse_nlevels = Inf,
     model_only = FALSE, constraints = NULL, ...)
tram_data(formula, data, subset, weights, offset, cluster, na.action = na.omit)

Arguments

formula

an object of class "formula": a symbolic description of the model structure to be fitted. The details of model specification are given under Details and in the package vignette.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula).

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If present, the weighted log-likelihood is maximised.

offset

this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases.

cluster

optional factor with a cluster ID employed for computing clustered covariances.

na.action

a function which indicates what should happen when the data contain NAs. The default is set to na.omit.

distribution

character specifying how the transformation function is mapped into probabilities. Available choices include the cumulative distribution functions of the standard normal, the standard logistic and the standard minimum extreme value distribution.

transformation

character specifying the complexity of the response-transformation. For discrete responses, one parameter is assigned to each level (except the last one), for continuous responses linear, log-linear and smooth (parameterised as a Bernstein polynomial) function are implemented.

LRtest

logical specifying if a likelihood-ratio test for the null of all coefficients in the linear predictor being zero shall be performed.

prob

two probabilities giving quantiles of the response defining the support of a smooth Bernstein polynomial (if transformation = "smooth").

support

a vector of two elements; the support of a smooth Bernstein polynomial (if transformation = "smooth").

bounds

an interval defining the bounds of a real sample space.

add

add these values to the support before generating a grid via mkgrid.

order

integer >= 1 defining the order of the Bernstein polynomial (if transformation = "smooth").

negative

logical defining the sign of the linear predictor.

scale

logical defining if variables in the linear predictor shall be scaled. Scaling is internally used for model estimation, rescaled coefficients are reported in model output.

extrapolate

logical defining the behaviour of the Bernstein transformation function outside support. The default FALSE is to extrapolate linearily without requiring the second derivative of the transformation function to be zero at support. If TRUE, this additional constraint is respected.

sparse_nlevels

integer; use a sparse model matrix if the number of levels of an ordered factor is at least as large as sparse_nlevels.

log_first

logical; if TRUE, a Bernstein polynomial is defined on the log-scale.

model_only

logical, if TRUE the unfitted model is returned.

constraints

additional constraints on regression coefficients in the linear predictor of the form lhs %*% coef(object) >= rhs, where lhs and rhs can be specified as a character (as in glht) or by a matrix lhs (assuming rhs = 0), or as a list containing the two elements lhs and rhs.

…

additional arguments.

Value

An object of class tram inheriting from mlt.

Details

The model formula is of the form y | s ~ x where y is an at least ordered response variable, s are the variables defining strata and x defines the linear predictor. y ~ x defines a model without strata (but response-varying intercept function) and y | s ~ 0 sets-up response-varying coefficients for all variables in s.

The two functions tram and tram_data are not intended to be called directly by users. Instead, functions Coxph (Cox proportional hazards models), Survreg (parametric survival models), Polr (models for ordered categorical responses), Lm (normal linear models), BoxCox (non-normal linear models) or Colr (continuous outcome logistic regression) allow direct access to the corresponding models.

The model class and the specific models implemented in tram are explained in the package vignette of package tram. The underlying theory of most likely transformations is presented in Hothorn et al. (2018), computational and modelling aspects in more complex situations are discussed by Hothorn (2018).

References

Torsten Hothorn, Lisa Moest, Peter Buehlmann (2018), Most Likely Transformations, Scandinavian Journal of Statistics, 45(1), 110--134, 10.1111/sjos.12291.

Torsten Hothorn (2018), Most Likely Transformations: The mlt Package, Journal of Statistical Software, forthcoming. URL: https://cran.r-project.org/package=mlt.docreg

Examples

Run this code

# NOT RUN {
  data("BostonHousing2", package = "mlbench")

  ### unconstrained regression coefficients
  ### BoxCox calls tram internally
  m1 <- BoxCox(cmedv ~ chas + crim + zn + indus + nox + 
               rm + age + dis + rad + tax + ptratio + b + lstat, 
               data = BostonHousing2)

  ### now with two constraints on regression coefficients
  m2 <- BoxCox(cmedv ~ chas + crim + zn + indus + nox + 
               rm + age + dis + rad + tax + ptratio + b + lstat, 
               data = BostonHousing2, 
               constraints = c("crim >= 0", "chas1 + rm >= 1.5"))
  coef(m1)
  coef(m2)

  K <- matrix(0, nrow = 2, ncol = length(coef(m2)))
  colnames(K) <- names(coef(m2))
  K[1, "crim"] <- 1
  K[2, c("chas1", "rm")] <- 1
  m3 <- BoxCox(cmedv ~ chas + crim + zn + indus + nox + 
               rm + age + dis + rad + tax + ptratio + b + lstat, 
               data = BostonHousing2, 
               constraints = list(K, c(0, 1.5)))
  all.equal(coef(m2), coef(m3))

# }

Run the code above in your browser using DataLab