Last chance! 50% off unlimited learning
Sale ends in
showAnnotatedSeq(x, sel = 1, ann = TRUE, pos = TRUE, start = 1, end = width(x)[sel], width = NA)
## S4 method for signature 'XStringSet'
## annotationMetadata(x, annCharset= ...) <- value
## S4 method for signature 'BioVector'
## annotationMetadata(x, annCharset= ...) <- value
"annotationMetadata"(x, ...) <- value
"annotationMetadata"(x)
"annotationMetadata"(x)
"annotationCharset"(x)
"annotationCharset"(x)
DNAStringSet
,
RNAStringSet
,
AAStringSet
(or as
BioVector
)annotationMetadata
: a character vector with the
annotation stringsannotationCharset
: a character vector with the
annotation
annotationMetadata
function (see below). It
is stored in the metadata list as named element
annotationCharset
and can be stored along with other
metadata assigned to the sequence set. The annotation
strings for the individual sequences are represented as a
character vector and can be assigned to the XStringSet
together with the assignment of the annotation characterset
as element related metadata. Element related metadata is
stored in a DataFrame and the columns of this data frame
represent the different types of metadata that can be
assigned in parallel. The column name for the sequence
related annotation information is "annotation". (see
Example section for an example of annotation metadata
assignment) Annotation metadata can be assigned together
with position metadata (see positionMetadata
to a sequence set. Annotation Specific Kernel
Processing The annotation specific kernel variant of
a kernel, e.g. the spectrum kernel appends the annotation
characters corresponding to a specific kmer to this kmer
and treats the resulting pattern as one feature - the basic
unit for similarity determination. The full feature space
of an annotation specific spectrum kernel is the cartesian
product of the set of all possible sequence patterns with
the set of all possible anntotions patterns. Dependent on
the number of characters in the annotation character set
the feature space increases drastically compared to the
normal spectrum kernel. But through annotation the
similarity consideration between two sequences can be split
into independent parts considered separately, e.g.
coding/non-coding, exon/intron, etc... . For amino acid
sequences e.g. a heptad annotation (consisting of a usually
periodic pattern of 7 characters (a to g) can be used as
annotation like in prediction of coiled coil structures.
(see reference Mahrenholz, 2011) The flag
annSpec
passed during creation of a kernel object
controls whether annotation information is evaluated by the
kernel. (see functions spectrumKernel,
gappyPairKernel, motifKernel
) In this way
sequences with annotation can be evaluated annotation
specific and without annotation through using two different
kernel objects. (see examples below) The annotation
specific kernel variant is available for all kernels in
this package except for the mismatch kernel.
annotationMetadata function With this function
annotation metadata can be assigned to sequences defined as
XStringSet (or BioVector). The sequence annotation strings
are stored as element related information and can be
retrieved with the method mcols
. The
characters used for anntation are stored as annotation
characterset for the sequence set and can be retrieved with
the method metadata
. For the assignment of
annotation metadata to biological sequences this function
should be used instead of the lower level functions
metadata and mcols. The function annotationMetadata
performs several checks and also takes care that other
metadata or element metadata assigned to the object is
kept. Annotation metadata are deleted if the parameters
annCharset
and annotation
are set to
NULL. showAnnotatedSeq function This function
displays individual sequences aligned with the annotation
string with 50 positions per line. The two header lines
show the start postion for each bock of 10
characters.Accessor-like methods The method annotationMetadata<- assigns annotation metadata to a sequence set. In the assignment also the annotation characterset must be specified. Annotation characters which are not listed in the characterset are treated like invalid sequence characters. They interrupt open patterns and lead to a restart of the pattern search at this position.
spectrumKernel
,
gappyPairKernel
, motifKernel
,
positionMetadata
, metadata
,
mcols
## create a set of annotated DNA sequences
## instead of user provided sequences in XStringSet format
## for this example a set of DNA sequences is created
x <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT",
"ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC",
"CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC",
"GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC",
"ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC"))
names(x) <- paste("S", 1:length(x), sep="")
## define the character set used in annotation
## the masking character '-' is is not part of the character set
anncs <- "ei"
## annotation strings for each sequence as character vector
## in the third and fourth sample a part of the sequence is masked
annotStrings <- c("eeeeeeeeeeeeiiiiiiiiieeeeeeeeeeeeeeeeiiiiiiiiii",
"eeeeeeeeeiiiiiiiiiiiiiiiiiiieeeeeeeeeeeeeeeeeee",
"---------eeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiiiiiii",
"eeeeeeeeeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiiiii----",
"eeeeeeeeeeeeiiiiiiiiiiiiiiiiiiiiiiieeeeeeeeeeee")
## assign metadata to DNAString object
annotationMetadata(x, annCharset=anncs) <- annotStrings
## show annotation
annotationMetadata(x)
annotationCharset(x)
## show sequence 3 aligned with annotation string
showAnnotatedSeq(x, sel=3)
## create annotation specific spectrum kernel
speca <- spectrumKernel(k=3, annSpec=TRUE, normalized=FALSE)
## show details of kernel object
kernelParameters(speca)
## this kernel object can be now be used in a classification or regression
## task in the usual way or you can use the kernel for example to generate
## the kernel matrix for use with another learning method in another R
## package.
kma <- speca(x)
kma[1:5,1:5]
## generate a dense explicit representation for annotation-specific kernel
era <- getExRep(x, speca, sparse=FALSE)
era[1:5,1:8]
## when a standard spectrum kernel is used with annotated
## sequences the anntotation information is not evaluated
spec <- spectrumKernel(k=3, normalized=FALSE)
km <- spec(x)
km[1:5,1:5]
## finally delete annotation metadata if no longer needed
annotationMetadata(x) <- NULL
## show empty metadata
annotationMetadata(x)
annotationCharset(x)
Run the code above in your browser using DataLab