Learn R Programming

AbSim (version 0.1)

fullRepertoire: Simulates full heavy chain antibody repertoires for either human or mice.

Description

Simulates full heavy chain antibody repertoires for either human or mice.

Usage

fullRepertoire(max.seq.num, max.timer, SHM.method, baseline.mut, SHM.branch.prob, SHM.branch.param, SHM.nuc.prob, species, VDJ.branch.prob, proportion.sampled, sample.time, max.tree.num)

Arguments

max.seq.num
The maximum number of tips allowed at the end of the simulation. The simulation will end when either this or the max.timer is reached. Note - this function does not take clonal frequency into account. This parameter resembles the species richness, or the measure of unique sequences in the repertoire.
max.timer
The maximum number of time steps allowed during the simulation. The simulation will end when either this or the max.seq.num is reached.
SHM.method
The mode of SHM speciation events. Options are either: "poisson","data","motif","wrc", and "all". Specifying "poisson" will result in mutations that can occur anywhere in the heavy chain region, with each nucleotide having an equal probability for a mutation event. Specifying "data" focuses mutation events during SHM in the CDR regions (based on IMGT), and there will be an increased probability for transitions (and decreased probability for transversions). Specifying "motif" will cause neighbor dependent mutations based on a mutational matrix from high throughput sequencing data sets (Yaari et al., Frontiers in Immunology, 2013). "wrc" allows for only the WRC mutational hotspots to be included (where W equals A or T and R equals A or G). Specifying "all" will use all four types of mutations during SHM branching events, where the weights for each can be specified in the "SHM.nuc.prob" parameter.
baseline.mut
Specifies the probability (gamma) for each nucleotide to be mutated inbetween speciation events. These mutations do not cause any branching events. This parameter gives each site a probability to be mutated (in all current sequences) at each time step. Currently these are only Poisson distributed but future releases will change it to allow for other mutation methods.
SHM.branch.prob
Specifies the probability for a given sequence to undergo SHM events (thus, branching events) This parameter corresponds to the distribution specified in "SHM.branch.prob". For "identical" only one value should be supplied. For "uniform", a vector of length 3 should be specified corresponding to n,min,max respectively (stats::runif(n, min = 0, max = 1)). For "exponential", a single value controlling the rate parameter (from stats::rexp()) should be supplied. For "lognorm" a vector of length two should be supplied, with the first value corresponding to meanlog and the second corresponding to sdlog (from stats::rlnorm). Similarly, for "normal" distribution, two values corresponding to the mean and standard deviation (respectively) should be supplied.
SHM.branch.param
Describes the probability of undergoing SHM events. This parameter is responsible for describing how likely each sequence will undergo branching events in the phylogeny. The following options are possible: "identical", "uniform", "exponential" ("exp"), "lognormal" ("lognorm"), "normal" ("norm").
SHM.nuc.prob
Specifies the rate at which nucleotides change during speciation (SHM) events. This parameter depends on the type of mutation specified by SHM.method. For both "poisson" and "data", the input value determines the probability for each site to mutate (the whole sequence for "poisson" and the CDRs for "data"). For either "motif" or "wrc", the number of mutations per speciation event should be specified. Note that these are not probabilities, but the number of mutations that can occur (if the mutation is present in the sequence). If "all" is specified, the input should be a vector where the first element controls the poisson style mutations, second controls the "data", third controls the "motif" and fourth controls the "wrc".
species
Either "mus" for C57BL/6 germline genes or "hum" for human germline genes. These genes were taking from IMGT. When more than one allele was present for a given gene, the first was used.
VDJ.branch.prob
The probabilty of a new VDJ recombination event of occuring. For the singleLineage function this will result in a branching event at the site of the unmutated germline. For fullRepertoire function, this will cause a new tree to begin.
proportion.sampled
Value ranging from 0 and 1 specifying the proportion of sequences to be sampled at each time point. Specifiying 1 indicates that all sequences will be recovered at each time point, whereas 0.5 will sample half of the sequences.
sample.time
Integer array indicating the time points at which sampling events should occur.
max.tree.num
Integer value describing maximum number of trees allowed to generate the core sequences of the repertoire. Each of these trees is started by an independent VDJ recombination event.

Value

Returns a nested list. output[[1]][[1]] is an array of the simulated sequences output[[2]][[1]] is an array names corresponding to each sequence. For example, output[[2]][[1]][1] is the name of the sequence corresponding to output[[1]][[1]][1]. The simulated tree of this is found in output[[3]][[1]]. The length of the output list is determined by the number of sampling points Thus if you have two sampling points, output[[4]][[1]] would be a character array holding the sequences with output[[5]][[1]] as a character array holding the corresponding names. Then the sequences recovered second sampling point would be stored at output[[6]][[1]], with the names at output[[7]][[1]]. This nested list was designed for full antibody repertoire simulations, and thus, may seem unintuitive for the single lineage function. The first sequence and name corresponds to the germline sequence that served as the root of the tree. See vignette for comprehensive example

See Also

singleLineage

Examples

Run this code
fullRepertoire(max.seq.num=51,max.timer=150,
 SHM.method="naive",baseline.mut = 0.0008,
 SHM.branch.prob = "identical", SHM.branch.param = 0.05,
 SHM.nuc.prob = 15/350,species="mus",
 VDJ.branch.prob = 0.1,proportion.sampled = 1,
 sample.time = 50,max.tree.num=3)

Run the code above in your browser using DataLab