The WeightMatrix
class and associated methods serve the purpose of enabling the VariantFiltering
package
to score synonymous and intronic genetic variants for potential cryptic splice sites. The class and the methods,
however, are exposed to the end user since they could be useful for other analysis purposes.The VariantFiltering
package contains two weight matrices, one for 5'ss and another for 3'ss, which have been built
by a statistical method that accounts for dependencies between the splice site positions, minimizing the rate of
false positive predictions. The method concretely builds these models by inclusion-driven learning of Bayesian
networks and further details can be found in the paper of Castelo and Guigo (2004).
The function readWm()
reads a weight matrix stored in a text file in a particular format and returns
a WeightMatrix
object. See the .ibn
files located in the extdata
folder of the VariantFiltering
package, as an example of this format.
The method wmScore()
scores one or more sequences of nucleotides using the input WeightMatrix
object.
If the sequences are longer than the width of the weight matrix, this function will score every possible site
within those sequences. It returns a vector of with the calculated scores. When the scores cannot be calculated
because of a conserved position that does not occur in the sequence (i.e., absence of a GT dinucleotide with the
5'ss weight matrix), it returns NA
as corresponding score value.
The method width()
takes a WeightMatrix
object as input and returns the number of positions of the
weight matrix.
The method conservedPositions()
takes a WeightMatrix
object as input and returns the number of
fully conserved positions in the weight matrix.