Transforms the data by a log transformation, modifying small and zero observations such that the transformation is linear for
LogSt(x, base = 10, calib = x, threshold = NULL, mult = 1)LogStInv(x, base = NULL, threshold = NULL)
a vector or matrix of data, which is to be transformed
a positive or complex number: the base with respect to which logarithms are computed. Defaults to 10. Use=exp(1) for natural log.
a vector or matrix of data used to calibrate the transformation(s), i.e., to determine
the constant
constant LogStInv
will
look for an attribute named "threshold"
if the argument is set to NULL
.
a tuning constant affecting the transformation of small values, see Details
.
the transformed data. The value attr(.,"threshold")
and the used base as attr(.,"base")
.
In order to avoid LogSt
handles this problem based on the following ideas:
The modification should only affect the values for "small" arguments.
What "small" is should be determined in connection with the non-zero values of the original variable, since it should behave well (be equivariant) with respect to a change in the "unit of measurement".
The function must remain monotone, and it should remain (weakly) convex.
These criteria are implemented here as follows: The shape is determined by a
threshold
This is obtained by
Small values are determined by the threshold threshold
, it is determined by the quartiles mult
.
The rationale is, that, for lognormal data, this constant identifies 2 percent of the data as small.
Beyond this limit, the transformation continues linear with the derivative of the log curve at this point.
Another idea for choosing the threshold
A generalized log (see: Rocke 2003) can be calculated in order to stabilize the variance as:
function (x, a) { return(log((x + sqrt(x^2 + a^2)) / 2)) }
Rocke, D M, Durbin B (2003): Approximate variance-stabilizing transformations for gene-expression microarray data, Bioinformatics. 22;19(8):966-72.
# NOT RUN {
dd <- c(seq(0,1,0.1), 5 * 10^rnorm(100, 0, 0.2))
dd <- sort(dd)
r.dl <- LogSt(dd)
plot(dd, r.dl, type="l")
abline(v=attr(r.dl, "threshold"), lty=2)
x <- rchisq(df=3, n=100)
# should give 0 (or at least something small):
LogStInv(LogSt(x)) - x
# }
Run the code above in your browser using DataLab