Antibody lineage simulations using only one set of V(D)J germline genes. The main difference between this function and the fullRepertoire function is that there can be multiple VDJ recombination events within one tree. Each VDJ recombination event in the singleLineage function is a branching event within the existing tree, whereas the VDJ recombination events in the fullRepertoire function start a new tree.
singleLineage(max.seq.num, max.timer, SHM.method, SHM.nuc.prob, baseline.mut,
SHM.branch.prob, SHM.branch.param, species, max.VDJ, VDJ.branch.prob,
proportion.sampled, sample.time, chain.type, vdj.model, vdj.insertion.mean,
vdj.insertion.stdv)
The maximum number of tips allowed at the end of the simulation. The simulation will end when either this or the max.timer is reached. Note - this function does not take clonal frequency into account. This parameter resembles the species richness, or the measure of unique sequences in the repertoire.
The maximum number of time steps allowed during the simulation. The simulation will end when either this or the max.seq.num is reached.
The mode of SHM speciation events. Options are either: "poisson","data","motif","wrc", and "all". Specifying either "poisson" or "naive" will result in mutations that can occur anywhere in the heavy chain region, with each nucleotide having an equal probability for a mutation event. Specifying "data" focuses mutation events during SHM in the CDR regions (based on IMGT), and there will be an increased probability for transitions (and decreased probability for transversions). Specifying "motif" will cause neighbor dependent mutations based on a mutational matrix from high throughput sequencing data sets (Yaari et al., Frontiers in Immunology, 2013). "wrc" allows for only the WRC mutational hotspots to be included (where W equals A or T and R equals A or G). Specifying "all" will use all four types of mutations during SHM branching events, where the weights for each can be specified in the "SHM.nuc.prob" parameter.
Specifies the rate at which nucleotides change during speciation (SHM) events. This parameter depends on the type of mutation specified by SHM.method. For both "poisson" and "data", the input value determines the probability for each site to mutate (the whole sequence for "poisson" and the CDRs for "data"). For either "motif" or "wrc", the number of mutations per speciation event should be specified. Note that these are not probabilities, but the number of mutations that can occur (if the mutation is present in the sequence). If "all" is specified, the input should be a vector where the first element controls the poisson style mutations, second controls the "data", third controls the "motif" and fourth controls the "wrc".
Specifies the probability (gamma) for each nucleotide to be mutated inbetween speciation events. These mutations do not cause any branching events. This parameter gives each site a probability to be mutated (in all current sequences) at each time step. Currently these are only Poisson distributed but future releases will change it to allow for other mutation methods.
Specifies the probability for a given sequence to undergo SHM events (thus, branching events) This parameter corresponds to the distribution specified in "SHM.branch.prob". For "identical" only one value should be supplied. For "uniform", a vector of length 3 should be specified corresponding to n,min,max respectively (stats::runif(n, min = 0, max = 1)). For "exponential", a single value controlling the rate parameter (from stats::rexp()) should be supplied. For "lognorm" a vector of length two should be supplied, with the first value corresponding to meanlog and the second corresponding to sdlog (from stats::rlnorm). Similarly, for "normal" distribution, two values corresponding to the mean and standard deviation (respectively) should be supplied.
Describes the probability of undergoing SHM events. This parameter is responsible for describing how likely each sequence will undergo branching events in the phylogeny. The following options are possible: "identical", "uniform", "exponential" ("exp"), "lognormal" ("lognorm"), "normal" ("norm").
Either "mus" for C57BL/6 germline genes or "hum" for human germline genes. These genes were taking from IMGT. When more than one allele was present for a given gene, the first was used.
The maximum number of VDJ events allowed. These VDJ events are independent of each other but use the same VDJ segments to create a new branching event in the tree at the unmutated germline.
The probabilty of a new VDJ recombination event of occuring. For the singleLineage function this will result in a branching event at the site of the unmutated germline. For fullRepertoire function, this will cause a new tree to begin.
Value ranging from 0 and 1 specifying the proportion of sequences to be sampled at each time point. Specifiying 1 indicates that all sequences will be recovered at each time point, whereas 0.5 will sample half of the sequences.
Integer array indicating the time points at which sampling events should occur.
String determining whether heavy or light chain should be simulated. Either "heavy" for heavy chains or "light" for light chains. Heavy chains will have V-D-J recombination, whereas light chain will just have V-J recombination.
Specifies the model used to simulate V-D-J recombination. Can be either "naive" or "data". "naive" is chain independent and does not differentiate between different species. To rely on the default "experimental" options, this should be "data" and the parameter vdj.insertion.mean should be "default". This will allow for different mean additions for either the VD and JD junctions and will differ depending on species.
Integer value describing the mean number of nucleotides to be inserted during simulated V-D-J recombination events. If "default" is entered, the mean will be normally distribut
Integer value describing the standard deviation corresponding to insertions of V-D-J recombination. No "default" parameter currently supported but will be updated with future experimental data. This should be a number if using a custom distribution for V-D-J recombination events, but can be "default" if using the "naive" vdj.model or the "data", with vdj.insertion.mean set to "default".
Returns a nested list containing both sequence information and phylogenetic trees. If "output" is the returned object, then output[[1]][[1]] is an array of the simulated sequences output[[2]][[1]] is an array names corresponding to each sequence. For example, output[[2]][[1]][1] is the name of the sequence corresponding to output[[1]][[1]][1]. The simulated tree of this is found in output[[3]][[1]]. The length of the output list is determined by the number of sampling points Thus if you have two sampling points, output[[4]][[1]] would be a character array holding the sequences with output[[5]][[1]] as a character array holding the corresponding names. Then the sequences recovered second sampling point would be stored at output[[6]][[1]], with the names at output[[7]][[1]]. This nested list was designed for full antibody repertoire simulations, and thus, may seem unintuitive for the single lineage function. The first sequence and name corresponds to the germline sequence that served as the root of the tree.
fullRepertoire