tran: Common data transformations and standardizations

Description

Provides common data transformations and standardizations useful for palaeoecological data. The function acts as a wrapper to function decostand in package vegan for several of the available options.

The formula method allows a convenient method for selecting or excluding subsets of variables before applying the chosen transformation.

Usage

## S3 method for class 'default':
tran(x, method, a = 1, b = 0, p = 2, base = exp(1),
     na.rm = FALSE, na.value = 0, ...)
## S3 method for class 'formula':
tran(formula, data = NULL, subset = NULL,
     na.action = na.pass, \ldots)

Arguments

A matrix-like object.

method

transformation or standardization method to apply. See Details for available options.

Constant to multiply x by. method = "log" only. Can be a vector, in which case the vector of values to multiply each column of x by.

Constant to add to x before taking logs.

method
      = "log"

only. Can be a vector, in which case the vector of values to add to each column of x.

The power to use in the power transformation.

base

the base with respect to which logarithms are computed. See log for further details. The default is to compute natural logarithms.

na.rm

Should missing values be removed before some computations?

na.value

The value with which to replace missing values (NA).

...

Arguments passed to decostand, or other tran methods.

formula

A model formula describing the variables to be transformed. The formula should have only a right hand side, e.g.~~ foo + bar.

data, subset, na.action

See model.frame for details on these arguments. data will generally be the object or environment within which the variables in the forumla are searched for.

Value

Returns the suitably transformed or standardized x. If x is a data frame, the returned value is like-wise a data frame. The returned object also has an attribute "tran" giving the name of applied transformation or standardization "method".

concept

transformation

Details

The function offers following transformation and standardization methods for community data:

sqrt: take the square roots of the observed values.
cubert: take the cube root of the observed values.
rootroot: take the fourthe root of the observed values. This is also known as the root root transformation (Field et al 1982).
log: take the logarithms of the observed values. The tansformation applied can be modified by constantsaandband thebaseof the logarithms. The transformation applied is$x^* = \log_{\mathrm{base}}(ax + b)$
reciprocal: returns the multiplicative inverse or reciprocal,$1/x$, of the observed values.
freq: divide by column (variable, species) maximum and multiply by the number of non-zero items, so that the average of non-zero entries is 1 (Oksanen 1983).
center: centre all variables to zero mean.
range: standardize values into range 0...1. If all values are constant, they will be transformed to 0.
percent: convert observed count values to percentages.
proportion: convert observed count values to proportions.
standardize: scalexto zero mean and unit variance.
pa: scalexto presence/absence scale (0/1).
missing: replace missing values withna.value.
chi.square: divide by row sums and square root of column sums, and adjust for square root of matrix total (Legendre & Gallagher 2001). When used with the Euclidean distance, the distances should be similar to the the Chi-square distance used in correspondence analysis. However, the results fromcmdscalewould still differ, since CA is a weighted ordination method.
hellinger: square root of observed values that have first been divided by row (site) sums (Legendre & Gallagher 2001).
wisconsin: applies the Wisconsin double standardization, where columns (species, variables) are first standardized by maxima and then sites (rows) by site totals.
pcent2prop: convert percentages to proportions.
prop2pcent: convert proportions to percentages.
logRatio: applies a log ransformation (seelogabove) to the data, then centres the data by rows (by subtraction of the mean for rowifrom the observations in rowi). Using this transformation subsequent to PCA results in Aitchison's Log Ratio Analysis (LRA), a means of dealing with closed compositional data such as common in palaeoecology (Aitchison, 1983).
power: applies a power tranformation.
rowCenter: Centresxby rows through the subtraction of the corresponding row mean from the observations in the row.

References

Aitchison, J. (1983) Principal components analysis of compositional data. Biometrika 70(1); 57--65. Field, J.G., Clarke, K.R., & Warwick, R.M. (1982) A practical strategy for analysing multispecies distributions patterns. Marine Ecology Progress Series 8; 37--52. Legendre, P. & Gallagher, E.D. (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129; 271-280. Oksanen, J. (1983) Ordination of boreal heath-like vegetation with principal component analysis, correspondence analysis and multidimensional scaling. Vegetatio 52; 181-189.

Examples

Run this code

data(swapdiat)
## convert percentages to proportions
sptrans <- tran(swapdiat, "pcent2prop")

## apply Hellinger transformation
spHell <- tran(swapdiat, "hellinger")

## Dummy data to illustrate formula method
d <- data.frame(A = runif(10), B = runif(10), C = runif(10))
## simulate some missings
d[sample(10,3), 1] <- NA
## apply tran using formula
tran(~ . - B, data = d, na.action = na.pass,
     method = "missing", na.value = 0)

Run the code above in your browser using DataLab