field_id
: Data subset identifier, defined with the input paramter fields
.
A variable number of columns, specified with the input parameter fields
.
polymorphism_call
: The novel allele call.
novel_imgt
: The novel allele sequence.
closest_reference
: The closest reference gene and allele in
the germline_db
database.
closest_reference_imgt
: Sequence of the closest reference gene and
allele in the germline_db
database.
germline_call
: The input (uncorrected) V call.
germline_imgt
: Germline sequence for germline_call
.
nt_diff
: Number of nucleotides that differ between the new allele and
the closest reference (closest_reference
) in the germline_db
database.
nt_substitutions
: A comma separated list of specific nucleotide
differences (e.g. 112G>A
) in the novel allele.
aa_diff
: Number of amino acids that differ between the new allele and the closest
reference (closest_reference
) in the germline_db
database.
aa_substitutions
: A comma separated list with specific amino acid
differences (e.g. 96A>N
) in the novel allele.
sequences
: Number of sequences unambiguosly assigned to this allele.
unmutated_sequences
: Number of records with the unmutated novel allele sequence.
unmutated_frequency
: Proportion of records with the unmutated novel allele
sequence (unmutated_sequences / sequences
).
allelic_percentage
: Percentage at which the (unmutated) allele is observed
in the sequence dataset compared to other (unmutated) alleles.
unique_js
: Number of unique J sequences found associated with the
novel allele. The sequences are those who have been unambiguously assigned
to the novel allelle (polymorphism_call
).
unique_cdr3s
: Number of unique CDR3s associated with the inferred allele.
The sequences are those who have been unambiguously assigned to the
novel allelle (polymorphism_call).
mut_min
: Minimum mutation considered by the algorithm.
mut_max
: Maximum mutation considered by the algorithm.
pos_min
: First position of the sequence considered by the algorithm (IMGT numbering).
pos_max
: Last position of the sequence considered by the algorithm (IMGT numbering).
y_intercept
: The y-intercept above which positions were considered
potentially polymorphic.
alpha
: Significance threshold to be used when constructing the
confidence interval for the y-intercept.
min_seqs
: Input min_seqs
. The minimum number of total sequences
(within the desired mutational range and nucleotide range) required
for the samples to be considered.
j_max
: Input j_max
. The maximum fraction of sequences perfectly
aligning to a potential novel allele that are allowed to utilize to a particular
combination of junction length and J gene.
min_frac
: Input min_frac
. The minimum fraction of sequences that must
have usable nucleotides in a given position for that position to be considered.
note
: Comments regarding the novel allele inferrence.