parse.folder: Parse input table files with immune receptor repertoire data.

Description

Load the TCR data from the file with the given filename to a data frame or load all files from the given folder to a list of data frames. The folder must contain onky files with the specified format. Input files could be either text files or archived with gzip ("filename.txt.gz") or bzip2 ("filename.txt.bz2"). For a general parser see parse.cloneset.

Parsers are available for: MiTCR ("mitcr"), MiTCR w/ UMIs ("mitcrbc"), MiGEC ("migec"), VDJtools ("vdjtools"), ImmunoSEQ ("immunoseq"), MiXCR ("mixcr") and IMSEQ ("imseq").

Output of MiXCR should contain either all hits or best hits for each gene segment.

Output of IMSEQ should be generated with parameter "-on". In this case there will be no positions of aligned gene segments in the output data frame due to restrictions of IMSEQ output.

Usage

parse.file(.filename,
.format = c('mitcr', 'mitcrbc', 'migec'), ...)
parse.file.list(.filenames,
.format = c('mitcr', 'mitcrbc', 'migec'), .namelist = NA)
parse.folder(.folderpath,
.format = c('mitcr', 'mitcrbc', 'migec'), ...)
parse.mitcr(.filename)
parse.mitcrbc(.filename)
parse.migec(.filename)
parse.vdjtools(.filename)
parse.immunoseq(.filename)
parse.mixcr(.filename)
parse.imseq(.filename)

Arguments

.folderpath

Path to the folder with text cloneset files.

.format

String specifing input format of files.

...

Parameters passed to parse.cloneset.

.filename

Path to the input file with cloneset data.

.filenames

Vector or list with paths to files with cloneset data.

.namelist

Either NA or character vector of length .filenames with names for output data frames.

Value

Data frame with immune receptor repertoire data. Each row in this data frame corresponds to a clonotype. The data frame has following columns:
- "Umi.count" - number of barcodes (events, UMIs);
- "Umi.proportion" - proportion of barcodes (events, UMIs);
- "Read.count" - number of reads;
- "Read.proportion" - proportion of reads;
- "CDR3.nucleotide.sequence" - CDR3 nucleotide sequence;
- "CDR3.amino.acid.sequence" - CDR3 amino acid sequence;
- "V.gene" - names of aligned Variable gene segments;
- "J.gene" - names of aligned Joining gene segments;
- "D.gene" - names of aligned Diversity gene segments;
- "V.end" - last positions of aligned V gene segments (1-based);
- "J.start" - first positions of aligned J gene segments (1-based);
- "D5.end" - positions of D'5 end of aligned D gene segments (1-based);
- "D3.end" - positions of D'3 end of aligned D gene segments (1-based);
- "VD.insertions" - number of inserted nucleotides (N-nucleotides) at V-D junction (-1 for receptors with VJ recombination);
- "DJ.insertions" - number of inserted nucleotides (N-nucleotides) at D-J junction (-1 for receptors with VJ recombination);
- "Total.insertions" - total number of inserted nucleotides (number of N-nucleotides at V-J junction for receptors with VJ recombination).

Examples

Run this code

# Parse file in "~/mitcr/immdata1.txt" as a MiTCR file.
immdata1 <- parse.file("~/mitcr/immdata1.txt", 'mitcr')
# Parse VDJtools file archive as .gz file.
immdata1 <- parse.file("~/mitcr/immdata3.txt.gz", 'vdjtools')
# Parse files "~/data/immdata1.txt" and "~/data/immdat2.txt" as MiGEC files.
immdata12 <- parse.file.list(c("~/data/immdata1.txt",
                             "~/data/immdata2.txt"), 'migec')
# Parse all files in "~/data/" as MiGEC files.
immdata <- parse.folder("~/data/", 'migec')

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples