MotifModelSet-class: Class "MotifModelSet"

Description

A set of MotifModels

Arguments

Objects from the Class

Objects can be created by calls of the form motifModelSet(seqs, motifNumber=NA, type="fixed", width=4, verbose=TRUE, clusterType="kmeans", maxGuess=10).

Details

This is a convenience class providing methods for a few common tasks that are necessary for analyzing multiple motifs on sequences. The function that creates these objects clusters the sequences according to a substitution type metric, see Sequences, and then fits motif models to each of the clusters. The resulting motif models can be used to discover the most likely motifs in the sequences and to classify new sequences into the motifs. Since there is clustering used to separate the various motifs, this approach is somewhat ad-hoc. There is an expectation-maximization done on motif position, but not on which motif each sequence belongs to. Due to the ad-hoc nature of dividing sequences into motifs, the ability to find the motif number relies on an elbow plot, which should be viewed by the user.

This method uses the same motif model for each cluster, but this is not required. More sophisticated modeling may be done by building a MotifModel for each cluster and then combining them by calling new("MotifModelSet", motifs=mlist), where mlist is a list of the motif models.

Typically, the number of motifs should be set by hand either through using the plot function on the sequences or examining the elbow plots that come from this function, motifModelSet. It is important to note, as mentioned in the Sequences examples, the clustering algorithm is sensitive to the substitution matrix used in the metric parameters.

Examples

Run this code

data(TULASequences)
TULAMList <- motifModelSet(TULASequences, width=6, motifNumber=4,
type="fixed")
plot(TULAMList)
plot(TULAMList@motifs[[1]])
print(TULAMList@motifs[[1]])


small.mlist <- motifModelSet(TULASequences, motifNumber=2,
type="fixed")
ll <- logLik(small.mlist)
print(ll)

large.mlist <- motifModelSet(TULASequences, motifNumber=5,
type="optional")
ll <- logLik(large.mlist)
print(ll)

#split the dataset
training.size <- nrow(TULASequences) * 2 / 3
training.indices <- sample(nrow(TULASequences), training.size)
testing.indices <- setdiff(1:nrow(TULASequences), training.indices)

training <- new("Sequences", TULASequences[training.indices,],
alphabet=TULASequences@alphabet)

testing <- new("Sequences", TULASequences[testing.indices,],
alphabet=TULASequences@alphabet)

#Now we have two sets of sequences

training.mlist <- motifModelSet(training, width=6,
motifNumber=3)

classes <- classify(training.mlist, testing)
#Now we have the classes on the unseen data

Run the code above in your browser using DataLab

Description

Arguments

Objects from the Class

Details

See Also

Examples