RVrarefied: Rarefied version of Escoufier RV coefficient

Description

Computes a rarefied estimate of the Escoufier RV coefficient to account for the dependence on sample size of RV, so that RV can be compared among groups with different number of observations (sample sizes)

Usage

RVrarefied(Block1, Block2, reps = 1000, samplesize, group = NULL, CI = 0.95)

Value

The function outputs a list with the following elements:

results: A data frame with columns `group`, `Mean`, `Median`, `CI_min`, `CI_max`. One row per group. For single-group analysis, group will be "All".
AllRarefiedSamples: A list with all RV values obtained using the rarefaction procedure for each group. For single-group analysis, this will be a list with one element named "All".

The returned object has class "EscoufierRVrarefy" and comes with associated S3 methods for convenient display and visualization:

print.EscoufierRVrarefy: Prints a formatted summary of results including confidence interval overlap assessment for multiple groups
plot.EscoufierRVrarefy: Creates a confidence interval plot using ggplot2

Arguments

Block1, Block2: Matrices or data frames containing each block of variables (observations in rows, variables in columns).
reps: number of resamplings to obtain the rarefied estimate
samplesize: sample size to which the rarefaction procedure is carried out
group: factor or vector indicating group membership for observations. If NULL (default), a single-group analysis is performed. If provided, the analysis is performed separately for each group.
CI: confidence interval level (default 0.95). Must be between 0 and 1.

Notice

the function does NOT perform GPA on each rarefied sample this may or may not make a difference in estimates. In many cases, it will probably not make much difference (e.g., Fig. 2 in Fruciano et al 2013)

Citation

If you use this function please cite both Fruciano et al. 2013 (for using the rarefaction procedure) and Escoufier 1973 (because the procedure is based on Escoufier RV)

Details

This function computes a rarefied estimate of Escoufier RV coefficient as suggested by Fruciano et al 2013 - Plos One This can be useful to compare RV among groups with the same variables but different sample sizes (as RV depends on sample size, see Fruciano et al 2013, where this procedure is described). The idea is the one rarefies the two groups at the same sample size. Following the approach in Fruciano et al. 2013 - Plos One, "rarefaction" is meant resampling to a smaller sample size with replacement (like in bootstrap).

References

Escoufier Y. 1973. Le Traitement des Variables Vectorielles. Biometrics 29:751-760.

Fruciano C, Franchini P, Meyer A. 2013. Resampling-Based Approaches to Study Variation in Morphological Modularity. PLoS ONE 8:e69376.

Examples

Run this code

library(MASS)
set.seed(123)
Pop=mvrnorm(100000,mu=rep(0,100), Sigma=diag(100))
# Create a population of 100,000 'individuals'
# as multivariate normal random data
# We will consider the first 20 columns as the first
# block of variables, and the following one as the second block

A=Pop[1:50,]
B=Pop[501:700,]
# Take two groups (A and B)
# from the same population (there should be no difference
# between them)

EscoufierRV(A[,1:20],A[,21:ncol(A)])
EscoufierRV(B[,1:20],B[,21:ncol(B)])
# Notice how we obtain very different values of Escoufier RV
# (this is because they two groups have very different
# sample sizes, one 50 observations, the other 200)

RarA=RVrarefied(A[,1:20],A[,21:ncol(A)],reps=1000,samplesize=30)
RarB=RVrarefied(B[,1:20],B[,21:ncol(A)],reps=1000,samplesize=30)
RarA$results  # Data frame with Mean, Median, CI_min, CI_max
RarB$results  # Data frame with Mean, Median, CI_min, CI_max
# Rarefying both groups at the same sample size
# (in this case 30)
# it is clear that the two groups have very similar levels
# of association between blocks

# Multi-group analysis with custom CI
combined_data = rbind(A, B)
group_labels = c(rep("GroupA", nrow(A)), rep("GroupB", nrow(B)))
multi_result = RVrarefied(combined_data[,1:20], combined_data[,21:ncol(combined_data)], 
                         reps=1000, samplesize=30, group=group_labels, CI=0.90)
print(multi_result$results)  # Data frame with results for each group
# Columns: group, Mean, Median, CI_min, CI_max

Run the code above in your browser using DataLab