rarefy: Rarefy OTU counts.

Description

Sub-sample OTU observations such that all samples have an equal number. If called on data with non-integer abundances, values will be re-scaled to integers between 1 and depth such that they sum to depth.

Usage

rarefy(
  counts,
  depth = 0.1,
  n_samples = NULL,
  seed = 0,
  times = NULL,
  drop = TRUE,
  margin = 1L,
  cpus = n_cpus()
)

Value

A rarefied matrix. Matrix and slam objects will be returned with the same type; otherwise a base R matrix will be returned.

Arguments

counts: A numeric matrix of count data where each column is a feature, and each row is a sample. Any object coercible with as.matrix() can be given here, as well as phyloseq, rbiom, SummarizedExperiment, and TreeSummarizedExperiment objects. For optimal performance with very large datasets, see the guide in vignette('performance').
depth: How many observations to keep per sample. When 0 < depth < 1, it is taken as the minimum percentage of the dataset's observations to keep. Ignored when n_samples is specified. Default: 0.1
n_samples: The number of samples to keep. When 0 < n_samples < 1, it is taken as the percentage of samples to keep. If negative, that number of samples is dropped. If 0, all samples are kept. If NULL, then depth is used instead. Default: NULL
seed: An integer seed for randomizing which observations to keep or drop. If you need to create different random rarefactions of the same data, set the seed to a different number each time. Default: 0
times: How many independent rarefactions to perform. If set, rarefy() will return a list of matrices. The seeds for each matrix will be sequential, starting from seed. Default: NULL
drop: Drop rows and columns with zero observations after rarefying. Default: TRUE
margin: If your samples are in the matrix's rows, set to 1L. If your samples are in columns, set to 2L. Ignored when counts is a phyloseq, rbiom, SummarizedExperiment, or TreeSummarizedExperiment object. Default: 1L
cpus: How many parallel processing threads should be used. The default, n_cpus(), will use all logical CPU cores.

Examples

Run this code

    # A 4-sample x 5-OTU matrix with samples in rows.
    counts <- matrix(c(0,0,0,0,0,8,9,10,5,5,5,5,2,0,0,0,6,5,7,0), 4, 5,
      dimnames = list(LETTERS[1:4], paste0('OTU', 1:5)))
    counts
    rowSums(counts)
    
    # Rarefy all samples to a depth of 13.
    # Note that sample 'A' has 0 counts and is dropped.
    r_mtx <- rarefy(counts, depth = 13, seed = 1)
    r_mtx
    rowSums(r_mtx)
    
    # Keep zero-sum rows and columns by setting `drop = FALSE`.
    rarefy(counts, depth = 13, drop = FALSE, seed = 1)
    
    # Rarefy to the depth of the 2nd most abundant sample (B, depth=22).
    rarefy(counts, n_samples = 2, seed = 1)
    
    # Perform 3 independent rarefactions.
    r_list <- rarefy(counts, depth = 13, times = 3, seed = 1)
    length(r_list)
    r_list[[1]]
    
    # The class of the input matrix is preserved.
    if (requireNamespace('Matrix', quietly = TRUE)) {
      counts_dgC <- Matrix::Matrix(counts, sparse = TRUE)
      class(counts_dgC)
      r_dgC <- rarefy(counts_dgC, depth = 13, seed = 1)
      class(r_dgC)
    }

Run the code above in your browser using DataLab