permanovaFL: PERMANOVA test of association based on the Freedman-Lane permutation scheme

Description

This function performs the PERMANOVA test that can allow adjustment of confounders and control of clustered data. It can also be used for testing presence-absence associations based on infinite number of rarefaction replicates. As in ldm, permanovaFL allows multiple sets of covariates to be tested, in the way that the sets are entered sequentially and the variance explained by each set is that part that remains after the previous sets have been fit. It allows testing of a survival outcome, by using the Martingale or deviance residual (from fitting a Cox model to the survival outcome and other covariates) as a covariate in the regression. It allows multiple distance matrices and provides an omnibus test in such cases. It also allows testing of the mediation effect of the microbiome in the pathway between the exposure(s) and the outcome(s), where the exposure(s) and outcomes(s) are specified as the first and second (sets of) covariates.

Usage

permanovaFL(
  formula,
  other.surv.resid = NULL,
  data = .GlobalEnv,
  tree = NULL,
  dist.method = c("bray"),
  dist.list = NULL,
  cluster.id = NULL,
  strata = NULL,
  how = NULL,
  perm.within.type = "free",
  perm.between.type = "none",
  perm.within.ncol = 0,
  perm.within.nrow = 0,
  n.perm.max = 5000,
  n.rej.stop = 100,
  seed = NULL,
  square.dist = TRUE,
  center.dist = TRUE,
  scale.otu.table = c(TRUE),
  binary = c(FALSE),
  n.rarefy = 0,
  test.mediation = FALSE,
  n.cores = 4,
  verbose = TRUE
)

Value

a list consisting of

F.statistics: F statistics for testing each set of covariates
R.squared: R-squared statistic for each set of covariates
F.statistics.OR, R.squared.OR: F statistics and R-squared statistic when the last covariate is other.surv.resid
p.permanova: p-values for testing each set of covariates
p.permanova.omni: the omnibus p-values (that combines information from multiple distance matrices) for testing each set of covariates
med.p.permanova: p-values for testing mediation
med.p.permanova.omni: the omnibus p-values for testing mediation
p.permanova.OR, p.permanova.omni.OR: when using other.surv.resid as the last covariate
med.p.permanova.OR, med.p.permanova.omni.OR: when using other.surv.resid as the outcome in the mediation analysis
p.permanova.com, p.permanova.omni.com: the combination test that combines the results from analyzing the Martingale residual and the Deviance residual (one specified in the formula and one specified in other.surv.resid)
med.p.permanova.com, med.p.permanova.omni.com: the combination test for the mediation effect
n.perm.completed: number of permutations completed
permanova.stopped: a logical value indicating whether the stopping criterion has been met by all tests of covariates
seed: the seed that is user supplied or internally generated, stored in case the user wants to reproduce the permutation replicates

Arguments

formula: a symbolic description of the model to be fitted in the form of data.matrix ~ sets of covariates or data.matrix | confounders ~ sets of covariates. The details of model specification are given in "Details" of ldm. Additionally, in permanovaFL, the data.matrix can be either an OTU table or a distance matrix. If it is an OTU table, the distance matrix will be calculated internally using the OTU table, tree (if required), and dist.method. If data.matrix is a distance matrix (having class dist or matrix), it can be squared and//or centered by specifying square.dist and center.dist (described below). Distance matrices are distinguished from OTU tables by checking for symmetry of as.matrix(data.matrix).
other.surv.resid: a vector of data, usually the Martingale or deviance residuals from fitting the Cox model to the survival outcome (if it is the outcome of interest) and other covariates.
data: an optional data frame, list or environment (or object coercible to a dataframe) containing the covariates of interest and confounding covariates. If not found in data, the covariates are taken from environment(formula), typically the environment from which permanovaFL is called. The default is .GlobalEnv.
tree: a phylogenetic tree. Only used for calculating a phylogenetic-tree-based distance matrix. Not needed if the calculation of the requested distance does not involve a phylogenetic tree, or if a distance matrix is directly imported through formula.
dist.method: a vector of methods for calculating the distance measure, partial match to all methods supported by vegdist in the vegan package (i.e., "manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao", "mahalanobis") as well as "hellinger" and "wt-unifrac". Not used if a distance matrix is specified in formula or dist.list. The default is c("bray"). For more details, see the dist.method argument in the ldm function.
dist.list: a list of pre-calculated distance matrices.
cluster.id: cluster identifiers. The default is value of NULL should be used if the observations are not in clusters (i.e., independent).
strata: a factor variable (or, character variable converted into a factor) to define strata (groups), within which to constrain permutations. The default is NULL.
how: a permutation control list, for users who want to specify their permutation control list using the how function from the permute R package. The default is NULL.
perm.within.type: a character string that takes values "free", "none", "series", or "grid". The default is "free" (for random permutations).
perm.between.type: a character string that takes values "free", "none", or "series". The default is "none".
perm.within.ncol: a positive integer, only used if perm.within.type="grid". The default is 0. See the documentation for the R package permute for further details.
perm.within.nrow: a positive integer, only used if perm.within.type="grid". The default is 0. See the documentation for the R package permute for further details.
n.perm.max: the maximum number of permutations. The default is 5000.
n.rej.stop: the minimum number of rejections (i.e., the permutation statistic exceeds the observed statistic) to obtain before stopping. The default is 100.
seed: a user-supplied integer seed for the random number generator in the permutation procedure. The default is NULL; with the default value, an integer seed will be generated internally and randomly. In either case, the integer seed will be stored in the output object in case the user wants to reproduce the permutation replicates.
square.dist: a logical variable indicating whether to square the distance matrix. The default is TRUE.
center.dist: a logical variable indicating whether to center the distance matrix as described by Gower (1966). The default is TRUE.
scale.otu.table: a vector of logical variables indicating whether to scale the OTU table in calculating the distance matrices in dist.method. For count data, this corresponds to dividing by the library size to give relative abundances. The default is TRUE.
binary: a vector of logical values indicating whether to base the calculation of the distance matrices in dist.method on presence-absence (binary) data. The default is c(FALSE) (analyzing relative abundance data).
n.rarefy: number of rarefactions. The default is 0 (no rarefaction).
test.mediation: a logical value indicating whether to perform the mediation analysis. The default is FALSE. If TRUE, the formula takes the specific form otu.table ~ exposure + outcome or most generally otu.table or distance matrix | (set of confounders) ~ (set of exposures) + (set of outcomes).
n.cores: The number of cores to use in parallel computing, i.e., at most how many child processes will be run simultaneously. The default is 4.
verbose: a logical value indicating whether to generate verbose output during the permutation process. Default is TRUE.

Author

Yi-Juan Hu <yijuan.hu@emory.edu>, Glen A. Satten <gsatten@emory.edu>

References

Hu YJ, Satten GA (2020). Testing hypotheses about the microbiome using the linear decomposition model (LDM) Bioinformatics, 36(14), 4106-4115.

Hu YJ and Satten GA (2021). A rarefaction-without-resampling extension of PERMANOVA for testing presence-absence associations in the microbiome. bioRxiv, https://doi.org/10.1101/2021.04.06.438671.

Zhu Z, Satten GA, Caroline M, and Hu YJ (2020). Analyzing matched sets of microbiome data using the LDM and PERMANOVA. Microbiome, 9(133), https://doi.org/10.1186/s40168-021-01034-9.

Hu Y, Li Y, Satten GA, and Hu YJ (2022) Testing microbiome associations with censored survival outcomes at both the community and individual taxon levels. bioRxiv, doi.org/10.1101/2022.03.11.483858.

Examples

Run this code

res.perm <- permanovaFL(throat.otu.tab5 | (Sex+AntibioticUse) ~ SmokingStatus+PackYears, 
                       data=throat.meta, dist.method="bray", seed=82955, n.perm.max=1000, n.cores=1, 
                       verbose=FALSE)

Run the code above in your browser using DataLab