aldex: ALDEx3 Linear Models

Description

ALDEx3 Linear Models

Usage

aldex(
  Y,
  X,
  data = NULL,
  method = "lm",
  nsample = 2000,
  scale = NULL,
  streamsize = 8000,
  n.cores = detectCores() - 1,
  return.pars = c("X", "estimate", "std.error", "p.val", "p.val.adj", "logComp",
    "logScale"),
  p.adjust.method = "BH",
  test = "t.HC3",
  onesided = FALSE,
  ...
)

Value

a list with elements controled by parameter resturn.pars. Options include: - X: P x N covariate matrix - estimate: (P x D x nsample) array of linear model estimates - std.error: (P x D x nsample) array of standard error for the estimates - p.val: (P x D) matrix, unadjusted p-value for two-sided t-test - p.val.adj: (P x D) matrix, p-value for two-sided t-test adjusted according to p.adj.method - logScale: (N x S) matrix, samples of the log scale from the scale model - logComp: (D x N x S) array, samples of the log composition from the multinomial-Dirichlet - streaming: boolean, detnote if streaming was used. - random.effects (Pr x N x S): if using mixed effects models, return all Pr random effects. Note, logScale and logComp are not returned if streaming is active.

Arguments

Y: an (D x N) matrix of sequence count, N is number of samples, D is number of taxa or genes
X: either a formula (in which case DATA must be non-null) or a model matrix of dimension P x N (P is number of linear model covariates). If a formula is passed it should not include the target Y, e.g., should simply be "~condition-1" (note the lack of the left hand side). If using lme4, this should be the formula including random effects.
data: a data frame for use with formula, must have N rows
method: (default lm) The regression method; "lm": linear regression, "lme4": linear mixed effects regression with lme4; "nlme": linear mixed effects models with nlme, REQUIRES "random" argument representing random effects be passed into aldex function, can also pass "correlation" as argument (see nlme documentation for how to use "correlation" argument).
nsample: number of monte carlo replicates
scale: the scale model, can be a function or an N x nsample matrix (samples should be given on log2-scale; e.g., samples should be of log of system scale). The API for writing your own scale models is documented below in examples.
streamsize: (default 8000) memory footprint (approximate) at which to use streaming. This should be thought of as the number of Mb for each streaming chunk. If DNnsample*8/1000000 is less than streamsize then no streaming will be performed. Note, to conserve memory, samples from the Dirichlet and scale models will not be returned if streaming is used. Streaming can be turned off by setting streamsize=Inf.
n.cores: (default detectCores()-1) If method is 'lme4', use this many cores for running mixed effects models in parallel.
return.pars: what results should be returned, see return section below.
p.adjust.method: (default BH) The method for multiple hypothesis test correction. See p.adjust for all available methods.
test: (default t.HC3), "t", t test is performed for each covariate (fast); "t.HC0" Heteroskedasticsity-Robust Standard Errors used (HC0; White's; slower); "t.HC3" (default) Heteroskedasticsity-Robust Standard Errors used (HC3; unlike HC0, this includes a leverage adjustment and is better for small sample sizes or when there are data with high leverage; slowest). To learn more about these, loko at Long and Ervin (2000) Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model, The American Statistician.
onesided: (default: FALSE) if sided return p-values for two-sided test. Otherwise if "lower" or "upper" return one-sided test corresponding to test that estimate is negative or positive respectively.
...: parameters to be passed to the scale model (if a function is provided), may also be random or correlation arguments for nlme.

Author

Justin Silverman, Kyle McGovern

Examples

Run this code

# \donttest{
Y <- matrix(1:110, 10, 11)
condition <- c(rep(0, 5), rep(1, 6))
data <- data.frame(condition=condition)
## demonstrate formula interface and passing optional argument (gamma) to
## the scale model (clr)
res <- aldex(Y, ~condition, data, nsample=2000, scale=clr.sm, gamma=0.5)

## demonstrating how to write a custom scale model, I will write a model
## that generalizes total sum scaling (where we assume no change between 
## conditions)
## Functions can include parameters X (model matrix), Y, and logComp.
## If included in the function definition, those parameters will be passed
## dynamically when aldex is running. Other optional parameters (gamma)
## can be passed as additional arguments to the aldex function
tss <- function(X, logComp, gamma=0.5) {
  P <- nrow(X)
  nsample <- dim(logComp)[3]
  LambdaScale <- matrix(rnorm(P*nsample,0,gamma), P, nsample)
  logScale <- t(X)%*% LambdaScale
  return(logScale)
}
res <- aldex(Y, ~condition, data, nsample=2000, scale=tss, gamma=0.75)
# }

Run the code above in your browser using DataLab