resample: Resample data frame using values from the column with number of clonesets.

Description

Resample data frame using values from the column with number of clonesets. Number of clonestes (i.e., rows of a MiTCR data frame) are reads (usually the "Read.count" column) or UMIs (i.e., barcodes, usually the "Umi.count" column).

Usage

resample(.data, .n = -1, .col = c("read.count", "umi.count"))
downsample(.data, .n, .col = c("read.count", "umi.count"))
prop.sample(.data, .perc = 50, .col = c("read.count", "umi.count"))

Arguments

.data

Data frame with the column .col or list of such data frames.

Number of values / reads / UMIs to choose.

.col

Which column choose to represent quanitites of clonotypes. See "Details".

.perc

Percentage (0 - 100). See "Details" for more info.

Value

Subsampled data frame.

Details

resample. Using multinomial distribution, compute the number of occurences for each cloneset, than remove zero-number clonotypes and return resulting data frame. Probabilities for rmultinom for each cloneset is a percentage of this cloneset in the .col column. It's a some sort of simulation of how clonotypes are chosen from the organisms. For now it's not working very well, so use downsample instead.

downsample. Choose .n clones (not clonotypes!) from the input repertoires without any probabilistic simulation, but exactly computing each choosed clones. Its output is same as for resample (repertoires), but is more consistent and biologically pleasant.

prop.sample. Choose the first N clonotypes which occupies .perc percents of overall UMIs / reads.

Examples

Run this code

# NOT RUN {
# Get 100K reads (not clones!).
immdata.1.100k <- resample(immdata[[1]], 100000, .col = "read.count")
# }