A dummy dataset with two proteins (A and B) from one host, human
proteins_1host
A data frame with 806 rows and 17 variables:
name of the protein
starting position of the aligned, overlapping k-mer window
number of k-mer sequences at the given position
k-mer position with sequences lesser than the minimum support threshold (TRUE) are considered of low support, in terms of sample size
level of variability at the k-mer position, with zero representing completely conserved
the predominant sequence (index motif) at the given k-mer position
the fraction (in percentage) of the index sequences at the k-mer position
the fraction (in percentage) of the major sequence (the predominant variant to the index) at the k-mer position
the fraction (in percentage) of minor sequences (of frequency lesser than the major variant, but not singletons) at the k-mer position
the fraction (in percentage) of unique sequences (singletons, observed only once) at the k-mer position
the fraction (in percentage) of sequences at the k-mer position that are variants to the index (includes: major, minor and unique variants)
incidence of the distinct k-mer peptides at the k-mer position
presence of more than one index sequence of equal incidence
species name of the organism host to the virus
k-mer position that has the highest entropy value
highest entropy values observed in the studied protein
average entropy values across all the k-mer positions