Learn R Programming

bcRep (version 1.1)

clones: Grouping sequences into clones

Description

This function uses IMGT/HighV-QUEST output files to define B cell clones. Therefore criteria using amino acid CDR3 sequences, V genes and J genes (optional) are used. A treshold for CDR3 identity can be given.

Usage

clones(aaseqtab = NULL, summarytab = NULL, ntseqtab = NULL, identity = 0.85, 
     useJ = TRUE,dispD = FALSE, dispSeqID = FALSE, dispCDR3aa = FALSE, 
     dispCDR3nt = FALSE, dispJunctionFr.ratio = FALSE, 
     dispJunctionFr.list = FALSE, dispFunctionality.ratio = FALSE, 
     dispFunctionality.list = FALSE, dispTotalSeq = FALSE, nrCores=1)

Arguments

aaseqtab
IMGT/HighV-QUEST output, file 5_AA-sequences(...).txt
summarytab
IMGT/HighV-QUEST output, file 1_Summary(...).txt
ntseqtab
IMGT/HighV-QUEST output, file 3_Nt-sequences(...).txt (optional)
identity
Treshold of CDR3 identity. A value between 0 and 1.
useJ
Shall J gene be included into analysis? default: TRUE
dispD
Shall D genes and alleles be returned? default: FALSE
dispSeqID
Shall sequence ID's be returned? default: FALSE
dispCDR3aa
Shall amino acid CDR3 sequences be returned? default: FALSE
dispCDR3nt
Shall nucleotide amino acid sequences be returned? default: FALSE
dispJunctionFr.ratio
Shall ratios of in-frame, out-of-frame and unknown junctions be returned? default: FALSE
dispJunctionFr.list
Shall a list of all junction frames be returned? default: FALSE
dispFunctionality.ratio
Shall ratios of productive, unproductive and unknown functionality sequences be returned? default: FALSE
dispFunctionality.list
Shall a list of all functionalities be returned? default: FALSE
dispTotalSeq
Shall all total nucleotide sequences be returned? default: FALSE
nrCores
Number of cores used for parallel processing (default: 1)

Value

  • Output of clones() is a data frame containing
  • unique_CDR3_sequences_[AA]unique CDR3 sequences belonging to this clone
  • CDR3_length_AACDR3 length in amino acids
  • number_of_unique_sequencesnumber of unique CDR3 sequences belonging to this clone
  • total_number_of_sequencesnumber of all sequences belonging to this clone (one sequence can appear several times)
  • sequence_count_per_CDR3sequence count for each of the unique CDR3 sequences
  • V_geneV gene belonging to this clone
  • V_gene_and_alleleoriginal IMGT V gene nomenclature
  • J_geneJ gene(s) belonging to this clone (if useJ=F, there can be several J genes)
  • J_gene_and_alleleoriginal IMGT J gene nomenclature
  • optional argumentsD_gene;_all_CDR3_sequences_AA; all_CDR3_sequences_nt; Funct_all_sequences; Funct_productive/unproductive/unknown sequences; Junction_frame_all_sequences; JF_in-frame/out-of-frame/unknown sequences; Sequence_IDs; Total_sequences_nt

Details

This function uses IMGT/HighV-QUEST output to define clones. Therefore amino acid CDR3 sequences, V genes and J genes (optional) are used. Criteria for clone groups are 1) same CDR3 length, 2) CDR3 identity of a given treshold, 3) same V gene and 4) same J gene (optional). A treshold for CDR3 identity has to be between 0 and 1. A cutoff of 0.85 means CDR3 identity of 85%. For example for a CDR3 length of 15 amino acids 85% identity would mean that at least 11 of 15 positions have to be identical (0.85*15 = 10.75; values are rounded up). useJ=T includes also the criteria of same J genes for clone defintion.

See Also

clones.CDR3Length, plotClonesCDR3Length, plotClonesCopyNumber, geneUsage, plotGeneUsage, clones.shared

Examples

Run this code
data(summarytab)
data(aaseqtab)

clones.tab<-clones(aaseqtab=aaseqtab,summarytab=summarytab, identity=0.85, useJ=TRUE, 
     dispCDR3aa=TRUE, dispFunctionality.ratio=TRUE, dispFunctionality.list=TRUE)

Run the code above in your browser using DataLab