kebabs (version 1.6.2)

motifKernel: Motif Kernel

Description

Create a motif kernel object and the kernel matrix

Usage

motifKernel(motifs, r = 1, annSpec = FALSE, distWeight = numeric(0),
  normalized = TRUE, exact = TRUE, ignoreLower = TRUE, presence = FALSE)

## S3 method for class 'MotifKernel': getFeatureSpaceDimension(kernel, x)

Arguments

motifs
a set of motif patterns specified as character vector. The order in which the patterns are passed for creation of the kernel object also determines the order of the features in the explicit representation. Lowercase characters in motifs are always converted to uppercase. For details concerning the definition of motif patterns see below and in the examples section.
r
exponent which must be > 0 (see details section in spectrumKernel). Default=1
annSpec
boolean that indicates whether sequence annotation should be taken into account (details see on help page for annotationMetadata). Default=FALSE
distWeight
a numeric distance weight vector or a distance weighting function (details see on help page for gaussWeight). Default=NULL
normalized
generated data from this kernel will be normalized (details see below). Default=TRUE
exact
use exact character set for the evaluation (details see below). Default=TRUE
ignoreLower
ignore lower case characters in the sequence. If the parameter is not set lower case characters are treated like uppercase. default=TRUE
presence
if this parameter is set only the presence of a motif will be considered, otherwise the number of occurances of the motif is used; Default=FALSE
kernel
a sequence kernel object
x
one or multiple biological sequences in the form of a DNAStringSet, RNAStringSet, AAStringSet (or as BioVector)

Value

  • motif: upon successful completion, the function returns a kernel object of class MotifKernel.

    of getDimFeatureSpace: dimension of the feature space as numeric value

code

exact=TRUE

deqn

$$s=\frac{\vec{x}^T\vec{y}}{\|\vec{x}\|\|\vec{y}\|}$$

cr

Creation of kernel matrix The kernel matrix is created with the function getKernelMatrix or via a direct call with the kernel object as shown in the examples below.

Details

Creation of kernel object The function 'motif' creates a kernel object for the motif kernel for a set of given DNA-, RNA- or AA-motifs. This kernel object can then be used to generate a kernel matrix or an explicit representation for this kernel. The individual patterns in the set of motifs are built similar to regular expressions through concatination of following elements in arbitrary order:
  • a specific character from the used character set - e.g. 'A' or 'G' in DNA patterns for matching a specific character
the wildcard character '.' which matches any valid character of the character set except '-' a substitution group specified by a collection of characters from the character set enclosed in square brackets - e.g. [AG] - which matches any of the listed characters; with a leading '^' the character list is inverted and matching occurs for all characters of the character set which are not listed except '-'

References

http://www.bioinf.jku.at/software/kebabs (Ben-Hur, 2003) -- A. Ben-Hur, and D. Brutlag. Remote homology detection: a motif based approach. (Bodenhofer, 2009) -- U. Bodenhofer, K. Schwarzbauer, M. Ionescu and S. Hochreiter. Modelling position specificity in sequence kernels by fuzzy equivalence relations. (Mahrenholz, 2011) -- C.C. Mahrenholz, I.G. Abfalter, U. Bodenhofer, R. Volkmer and S. Hochreiter. Complex networks govern coiled-coil oligomerizations - predicting and profiling by means of a machine learning approach. J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: http://dx.doi.org/10.1093/bioinformatics/btv176{10.1093/bioinformatics/btv176}.

See Also

kernelParameters-method, getKernelMatrix, getExRep, spectrumKernel, mismatchKernel, gappyPairKernel

Examples

Run this code
## instead of user provided sequences in XStringSet format
## for this example a set of DNA sequences is created
## RNA- or AA-sequences can be used as well with the motif kernel
dnaseqs <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC",
                          "CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC",
                          "GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC"))
names(dnaseqs) <- paste("S", 1:length(dnaseqs), sep="")

## create the kernel object with the motif patterns
mot <- motifKernel(c("A[CG]T","C.G","G[^A][AT]"), normalized=FALSE)
## show details of kernel object
mot

## generate the kernel matrix with the kernel object
km <- mot(dnaseqs)
dim(km)
km

## alternative way to generate the kernel matrix
km <- getKernelMatrix(mot, dnaseqs)

## plot heatmap of the kernel matrix
heatmap(km, symm=TRUE)

## generate rectangular kernel matrix
km <- mot(x=dnaseqs, selx=1:3, y=dnaseqs, sely=4:5)
dim(km)
km

Run the code above in your browser using DataCamp Workspace