shared.repertoire: Shared TCR repertoire managing and analysis

Description

Generate a repertoire of shared sequences - sequences presented in more than one subject. If sequence is appeared more than once in the one repertoire, than only the first appeared one will be choosed for a shared repertoire.

shared.repertoire - make a shared repertoire of sequences from the given list of data frames.

shared.matrix - leave columns, which related to the count of sequences in people, and return them as a matrix. I.e., this functions will remove such columns as 'CDR3.amino.acid.sequence', 'V.segments', 'People'.

Usage

shared.repertoire(.datalist, .type = 'avc', .min.ppl = 1, .head = -1,
                  .clear = T, .verbose = T, .by.col = '', .sum.col = '',
                  .max.ppl = length(.datalist))
shared.matrix(.shared.rep)

Arguments

.datalist

List with data frames.

.type

String of length 3 denotes how to create a shared repertoire. See "Details" for more information. If supplied, than parameters .by.col and .sum.col will be ignored. If not supplied, than columns in .by.col and

.min.ppl

At least how many people must have a sequence to leave this sequence in the shared repertoire.

.head

Parameter for the head function, applied to all data frames before clearing.

.clear

If T than remove all sequences which have symbols "~" or "*" (i.e., out-of-frame sequences for amino acid sequences).

.verbose

If T than output progress.

.by.col

Character vector with names of columns with sequences and their parameters (like segment) for using for creating a shared repertoire.

.sum.col

Character vector of length 1 with names of the column with count, percentage or any other numeric chaaracteristic of sequences for using for creating a shared repertoire.

.max.ppl

At most how many people must have a sequence to leave this sequence in the shared repertoire.

.shared.rep

Shared repertoire.

Value

Data.table for shared.repertoire, matrix for shared.matrix.

Details

Parameter .type is a string of length 3, where:

First character stands either for the letter 'a' for taking the "CDR3.amino.acid.sequence" column or for the letter 'n' for taking the "CDR3.nucleotide.sequence" column.
Second character stands whether or not take the V.segments column. Possible values are '0' (zero) stands for taking no additional columns, 'v' stands for taking the "V.segments" column.
Third character stands for name of the column to choose as numeric characteristic of sequences. Possible values are "c" for the "Read.count" column, "p" for the "Percentage" column, "r" for the "Rank" column or "i" for the "Index" column. If "Rank" or "Index" isn't in the given repertoire, than it will be created usingset.rankfunction using default "Read.count" column.

Examples

Run this code

# Set "Rank" column in data by "Read.count" column.
# This is doing automatically in shared.repertoire() function
# if the "Rank" column hasn't been found.
immdata <- set.rank(immdata)
# Generate shared repertoire using "CDR3.amino.acid.sequence" and
# "V.segments" columns and with rank.
imm.shared.av <- shared.repertoire(immdata, 'avr')

Run the code above in your browser using DataLab