VanValen: van Valen's test

Description

Computes van Valen's test for the comparison of the variation in two multivariate samples. The comparison is made in terms of distances between all standardized variables from their corresponding standardized medians, thus producing two sets of pooled distances, one per sample, whose means are then compared by a two-sample t-test.

Usage

VanValen(x, group, level1, alternative = "two.sided", var.equal = FALSE)

Value

Returns an object of class "VanValen", a list containing the following components:

name A character string describing the function. std.data A list with two data frames matlevel1 and matlevel2 containing the values of the standardized variables for samples 1 and 2 respectively medians.std A list containing two vectors. The first vector medians.std1 contains the medians for all standardized variables in sample 1 as declared in parameter level1, and the second vector, medians.std2, holds the corresponding medians for the other sample. dev.median A list with two data frames dev.median1 and dev.median2 containing the deviations from sample medians for samples 1 and 2, respectively. d.list A list with two data frames d.level1 and d.level2 containing the pooled distances of standardized variables from their corresponding medians for samples 1 and 2, respectively. means.d A named numeric vector carrying the mean pooled distances for samples 1 and 2, respectively vars.d A named numeric vector carrying the variance of pooled distances for samples 1 and 2, respectively t.vec

A named numeric vector containing the t-statistic, the degrees of freedom and the p-value for the test, respectively. alternative a character string specifying the alternative hypothesis chosen. var.equal A logical variable indicating whether the two variances were treated as being equal TRUE or not FALSE. group A character string specifying the name of the two-level factor defining groups. levels.group A vector of length two, showing the two levels in factor group. data.name A character string giving the name of the data. variables A character string vector containing the variable names. data The data frame analyzed.

Arguments

x: a data frame with one two-level factor and p response variables.
group: two-level factor defining groups. It must be one of the columns in x.
level1: a character string identifying Sample 1. The string must be one of the factor levels in group.
alternative: a character string specifying the alternative hypothesis in the t-test for the comparison of mean pooled distances. Must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
var.equal: a logical variable indicating whether to treat the two variances of pooled distances as being equal. If TRUE then the pooled variance is used to estimate the variance; otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.

Author

Jorge Navarro Alberto, ganava4@gmail.com

Details

To ensure that all variables are given equal weight, each variable is first standardized in van Valen's test, so that the mean is zero and variance is one for all samples combined before the calculation of the pooled distances. These are given by

$$d_{ij} = \sqrt{\sum_{k = 1}^{p}{(x_{ijk}-M_{jk})^2}}$$

where

$x_{ijk}$ is the value of the standardized variable $X_{k}$ for the $i$th individual in sample $j$, and

$M_{jk}$ is the median of the same standardized variable in the $j$th sample.

The sample means of the $d_{ij}$ values are compared with a t-test. If one sample is more variable than another, then the mean $d_{ij}$ values will tend to be higher in that sample. The expression for $d_{ij}$ in van Valen's is based on an implicit assumption that if the two samples being tested differ, then one sample will be more variable than the other for all variables. A significant result cannot be expected in a case where, for example, $X_1$ and $X_2$ are more variable in sample 1, but $X_3$ and $X_4$ are more variable in sample 2. The effect of the differing variances would then tend to cancel out in the calculation of $d_{ij}$. Thus, Van Valen's test is not appropriate for situations where changes in the level of variation are not expected to be consistent for all variables.

References

Manly, B.F.J., Navarro Alberto, J.A. and Gerow, K. (2024) Multivariate Statistical Methods. A Primer. 5th Edn. CRC Press.

van Valen, L. (1978) The statistics of variation. Evolutionary Theory 4: 33-43. (Erratum Evolutionary Theory 4: 202.)

Examples

Run this code

data(sparrows)
res.VanValen <- VanValen(sparrows, "Survivorship", "S",
                         alternative = "less", var.equal = TRUE)
# Brief output
res.VanValen

Run the code above in your browser using DataLab