Learn R Programming

PWMEnrich (version 4.8.2)

motifSimilarity: Calculates similarity between two PFMs.

Description

This function calculates the normalized motif correlation as a measure of motif frequency matrix similarity.

Usage

motifSimilarity(m1, m2, trim = 0.4, self.sim = FALSE)

Arguments

m1
matrix with four rows representing the frequency matrix of first motif
m2
matrix with four rows representing the frequency matrix of second motif
trim
bases with information content smaller than this value will be trimmed off both motif ends
self.sim
if to calculate self similarity (i.e. without including offset=0 in alignment)

Details

This score is essentially a normalized version of the sum of column correlations as proposed by Pietrokovski (1996). The sum is normalized by the average motif length of m1 and m2, i.e. (ncol(m1)+ncol(m2))/2. Thus, for two idential motifs this score is going to be 1. For unrelated motifs the score is going to be typically around 0.

Motifs need to aligned for this score to be calculated. The current implementation tries all possible ungapped alignment with a minimal of two basepair matching, and the maximal score over all alignments is returned.

Motif 1 is aligned both to Motif 2 and its reverse complement. Thus, the motif similarities are the same if the reverse complement of any of the two motifs is given.

References

Pietrokovski S. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res 1996;24:3836-3845.

Examples

Run this code
if(require("PWMEnrich.Dmelanogaster.background")){
   data(MotifDb.Dmel.PFM)

    # calculate the similarity of tin and vnd motifs (which are almost identical)
    motifSimilarity(MotifDb.Dmel.PFM[["tin"]], MotifDb.Dmel.PFM[["vnd"]])

    # similarity of two unrelated motifs
    motifSimilarity(MotifDb.Dmel.PFM[["tin"]], MotifDb.Dmel.PFM[["ttk"]])
}

Run the code above in your browser using DataLab