Transformations: Functions for Data Transformation

Description

Transformations for factors and numeric variables.

Usage

id_trafo(x)
rank_trafo(x, ties.method = c("mid-ranks", "random"))
normal_trafo(x, ties.method = c("mid-ranks", "average-scores"))
median_trafo(x, mid.score = c("0", "0.5", "1"))
savage_trafo(x, ties.method = c("mid-ranks", "average-scores"))
consal_trafo(x, ties.method = c("mid-ranks", "average-scores"), a = 5)
koziol_trafo(x, ties.method = c("mid-ranks", "average-scores"), j = 1)
klotz_trafo(x, ties.method = c("mid-ranks", "average-scores"))
mood_trafo(x, ties.method = c("mid-ranks", "average-scores"))
ansari_trafo(x, ties.method = c("mid-ranks", "average-scores"))
fligner_trafo(x, ties.method = c("mid-ranks", "average-scores"))
logrank_trafo(x, ties.method = c("mid-ranks", "Hothorn-Lausen",
                                 "average-scores"),
              weight = logrank_weight, ...)
logrank_weight(time, n.risk, n.event,
               type = c("logrank", "Gehan-Breslow", "Tarone-Ware",
                        "Peto-Peto", "Prentice", "Prentice-Marek",
                        "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington",
                        "Gaugler-Kim-Liao", "Self"),
               rho = NULL, gamma = NULL)
f_trafo(x)
of_trafo(x, scores = NULL)
zheng_trafo(x, increment = 0.1)
maxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
fmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
ofmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob)
trafo(data, numeric_trafo = id_trafo, factor_trafo = f_trafo,
      ordered_trafo = of_trafo, surv_trafo = logrank_trafo,
      var_trafo = NULL, block = NULL)
mcp_trafo(...)

Value

A numeric vector or matrix with nrow(x) rows and an arbitrary number of columns. For trafo(), a named matrix with nrow(data) rows and an arbitrary number of columns.

Arguments

x: an object of class "numeric", "factor", "ordered" or "Surv".
ties.method: a character, the method used to handle ties. The score generating function either uses the mid-ranks ("mid-ranks", default) or, in the case of rank_trafo(), randomly broken ties ("random"). Alternatively, the average of the scores resulting from applying the score generating function to randomly broken ties are used ("average-scores"). See logrank_test() for a detailed description of the methods used in logrank_trafo().
mid.score: a character, the score assigned to observations exactly equal to the median: either 0 ("0", default), 0.5 ("0.5") or 1 ("1"); see median_test().
a: a numeric vector, the values taken as the constant \(a\) in the Conover-Salsburg scores. Defaults to 5.
j: a numeric, the value taken as the constant \(j\) in the Koziol-Nemec scores. Defaults to 1.
weight: a function where the first three arguments must correspond to time, n.risk, and n.event given below. Defaults to logrank_weight.
time: a numeric vector, the ordered distinct time points.
n.risk: a numeric vector, the number of subjects at risk at each time point specified in time.
n.event: a numeric vector, the number of events at each time point specified in time.
type: a character, one of "logrank" (default), "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "Prentice", "Prentice-Marek", "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test().
rho: a numeric vector, the \(\rho\) constant when type is "Tarone-Ware", "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test(). Defaults to NULL, implying 0.5 for type = "Tarone-Ware" and 0 otherwise.
gamma: a numeric vector, the \(\gamma\) constant when type is "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test(). Defaults to NULL, implying 0.
scores: a numeric vector or list, the scores corresponding to each level of an ordered factor. Defaults to NULL, implying 1:nlevels(x).
increment: a numeric, the score increment between the order-restricted sets of scores. A fraction greater than 0, but smaller than or equal to 1. Defaults to 0.1.
minprob: a numeric, a fraction between 0 and 0.5; see maxstat_test(). Defaults to 0.1.
maxprob: a numeric, a fraction between 0.5 and 1; see maxstat_test(). Defaults to 1 - minprob.
data: an object of class "data.frame".
numeric_trafo: a function to be applied to elements of class "numeric" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to id_trafo.
factor_trafo: a function to be applied to elements of class "factor" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to f_trafo.
ordered_trafo: a function to be applied to elements of class "ordered" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to of_trafo.
surv_trafo: a function to be applied to elements of class "Surv" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to logrank_trafo.
var_trafo: an optional named list of functions to be applied to the corresponding variables in data. Defaults to NULL.
block: an optional factor whose levels are interpreted as blocks. trafo is applied to each level of block separately. Defaults to NULL.
...: logrank_trafo(): further arguments to be passed to weight. mcp_trafo(): factor name and contrast matrix (as matrix or character) in a tag = value format for multiple comparisons based on a single unordered factor; see mcp() in package multcomp.

Details

The utility functions documented here are used to define specialized test procedures.

id_trafo() is the identity transformation.

rank_trafo(), normal_trafo(), median_trafo(), savage_trafo(), consal_trafo() and koziol_trafo() compute rank (Wilcoxon) scores, normal (van der Waerden) scores, median (Mood-Brown) scores, Savage scores, Conover-Salsburg scores (see neuropathy) and Koziol-Nemec scores, respectively, for location problems.

klotz_trafo(), mood_trafo(), ansari_trafo() and fligner_trafo() compute Klotz scores, Mood scores, Ansari-Bradley scores and Fligner-Killeen scores, respectively, for scale problems.

logrank_trafo() computes weighted logrank scores for right-censored data, allowing for a user-defined weight function through the weight argument (see GTSG).

f_trafo() computes dummy matrices for factors and of_trafo() assigns scores to ordered factors. For ordered factors with two levels, the scores are normalized to the \([0, 1]\) range. zheng_trafo() computes a finite collection of order-restricted scores for ordered factors (see jobsatisfaction, malformations and vision).

maxstat_trafo(), fmaxstat_trafo() and ofmaxstat_trafo() compute scores for cutpoint problems (see maxstat_test()).

trafo() applies its arguments to the elements of data according to the classes of the elements. A trafo() function with modified default arguments is usually supplied to independence_test() via the xtrafo or ytrafo arguments. Fine tuning, i.e., different transformations for different variables, is possible by supplying a named list of functions to the var_trafo argument.

mcp_trafo() computes contrast matrices for factors.

Examples

Run this code

## Dummy matrix, two-sample problem (only one column)
f_trafo(gl(2, 3))

## Dummy matrix, K-sample problem (K columns)
x <- gl(3, 2)
f_trafo(x)

## Score matrix
ox <- as.ordered(x)
of_trafo(ox)
of_trafo(ox, scores = c(1, 3:4))
of_trafo(ox, scores = list(s1 = 1:3, s2 = c(1, 3:4)))
zheng_trafo(ox, increment = 1/3)

## Normal scores
y <- runif(6)
normal_trafo(y)

## All together now
trafo(data.frame(x = x, ox = ox, y = y), numeric_trafo = normal_trafo)

## The same, but allows for fine-tuning
trafo(data.frame(x = x, ox = ox, y = y), var_trafo = list(y = normal_trafo))

## Transformations for maximally selected statistics
maxstat_trafo(y)
fmaxstat_trafo(x)
ofmaxstat_trafo(ox)

## Apply transformation blockwise (as in the Friedman test)
trafo(data.frame(y = 1:20), numeric_trafo = rank_trafo, block = gl(4, 5))

## Multiple comparisons
dta <- data.frame(x)
mcp_trafo(x = "Tukey")(dta)

## The same, but useful when specific contrasts are desired
K <- rbind("2 - 1" = c(-1,  1, 0),
           "3 - 1" = c(-1,  0, 1),
           "3 - 2" = c( 0, -1, 1))
mcp_trafo(x = K)(dta)

Run the code above in your browser using DataLab