`linWeight(d, sigma = 1)`expWeight(d, sigma = 1)

gaussWeight(d, sigma = 1)

swdWeight(d)

## S4 method for signature 'XStringSet'
## positionMetadata(x) <- value

## S4 method for signature 'BioVector'
## positionMetadata(x) <- value

## S3 method for class 'XStringSet':
positionMetadata(x)

## S3 method for class 'BioVector':
positionMetadata(x)

d

a numeric vector of distance values

sigma

a positive numeric value defining the peak width or in case of
gaussWeight the width of the bell function (details see below)

x

biological sequences in the form of a

DNAStringSet

, RNAStringSet

,
AAStringSet

(or as BioVector

)value

for assignment of position metadata the value is an integer
vector with gives the offset to the start position 1 for each sequence.
Positive and negative offset values are possible. Without position metadata
all sequences must be aligned and start at position 1. For deletion of
position metadata set value to NULL.

- The distance weighting functions return a numerical vector with distance weights.

- Distance Weighted Kernel
These kernels weight the contribution to the similarity value
based on the distance of their start positions in the two sequences. The
user can define the distance weights either through passing a distance
weighting function or a weight vector to the kernel. Through this weighting
the degree of locality in the similarity consideration between two sequences
can be adjusted flexibly. Such a position dependent kernel can be used in
the same way as the normal position independent kernel variant. Distance
weighting can be used for all kernels in this package except the mismatch
kernel. The package defines four predefined weighting functions (see also
examples):
- linWeigth: a weighting function with linear decrease

- expWeight: a weighting function with exponential decrease
- gaussWeigth: a bell-shaped weighting function with a decrease similar to a gaussian distribution
- swdWeight: the distance weighting function used in the Shifted Weighted Degree (SWD) kernel which is similar to an exponential decrease but it has a smaller peak and larger tails
- Position Specific Kernel
One variant of position dependent kernels is the position specific kernel.
This kernel takes patterns into account only if they are located at
identical positions in the two sequences. This kernel can be selected
through passing a distance weight value of 1 to the kernel indicating that
the neighborhood of a pattern in the other sequence is irrelevant for the
similarity consideration. This kernel is in fact one end of the
spectrum (sic!) where locality is reduced to the exact location and the
normal position independent kernel is at the other end - not caring about
position at all. Through adjustment of sigma in the predefined functions
a continous blending between these two extremes is possible for the degree
of locality. Evaluation of position information is controlled through
setting the parameter
`distWeight`

to 1 in the functions`spectrumKernel, gappyPairKernel, motifKernel`

. This parameter value is in fact interpreted as a numeric vector with 1 for zero distance and 0 for all other distances. Positive Definiteness The standard SVMs only support positive definite kernels / kernel matrices. This means that the distance weighting function must must be chosen such that the resulting kernel is positive definite. For positive definiteness also symmetry of the distance weighting function is important. Unlike usual distances the relative distance value here can have positive and negative values dependent on whether the pattern in the second sequence is located at higher or lower positions than the pattern in the first sequence. The predefined distance weighting functions except for swdWeight deliver a positive definite kernel for all parameter settings. According to Sonnenburg et al. 2005 the SWD kernel has empirically shown positive definiteness but it is not proved for this kernel. If a weight vector with predefined weights per distance is passed to the kernel instead of a distance weighting function positive definiteness of the kernel must also be ensured by adequate selection of the weight values. User-Defined Distance Function For user defined distance functions symmetry and positive definitness of the resulting kernel are important. Such a function gets a numeric distance vector 'x' as input (and possibly other parameters controlling the weighting behavior) and returns a weight vector of identical length. When called with a missing parameter x all other parameters must be supplied or have appropriate default values. In this case the function must return a new function with just the single parameter x which calls the original user defined function with x and all the other parameters set to the values passed in the call. This behavior is needed for assignment of the function with missing parameter x to the distWeight parameter in the kernel. At the time of kernel definition the actual distance values are not available. Later when sequence data is passed to this kernel for generation of a kernel matrix or an explicit representation this single argument function is called to get the distance dependent weights. The code for the predefined`expWeight`

function in the example section below shows how a user-specific function can be set up. Offset To allow flexible alignment of sequence positions without redefining the XStringSet or BioVector an additional metadata element named offset can be assigned to the sequence set via`positionMetadata<-`

(see example below). Position metadata is a numeric vector with the same number of elements as the sequence set and gives for each sequence an offset to position 1. When positions metadata is not assigned to a sequence set the position 1 is associated with the first character in each sequence of the sequence set., i.e. in this case the sequences should be aligned such that all have the same starting positions with respect to the learning task (e.g. all sequences start at a transcription start site). Offset information is only evaluated in position dependent kernel variants.

`sequenceKernel`

in three different ways representing the
full range of position dependency:
- Position independent kernel: ignores the position of patterns and
just takes the number of their occurances or their presence (see parameter
`presence`

in functions`spectrumKernel, gappyPairKernel, motifKernel`

) in the sequences into account for similarity determination.

`spectrumKernel`

, `gappyPairKernel`

,
`motifKernel`

, `annotationMetadata`

,
`metadata`

, `mcols`

## plot predefined weighting functions for sigma=10 curve(linWeight(x, sigma=10), from=-20, to=20, xlab="pattern distance", ylab="weight", main="Predefined Distance Weighting Functions", col="green") curve(expWeight(x, sigma=10), from=-20, to=20, col="blue", add=TRUE) curve(gaussWeight(x, sigma=10), from=-20, to=20, col="red", add=TRUE) curve(swdWeight(x), from=-20, to=20, col="orange", add=TRUE) legend('topright', inset=0.03, title="Weighting Functions", c("linWeight", "expWeight", "gaussWeight", "swdWeight"), fill=c("green", "blue", "red", "orange")) text(14, 0.70, "sigma = 10") ## instead of user provided sequences in XStringSet format ## for this example a set of DNA sequences is created ## RNA- or AA-sequences can be used as well with the motif kernel dnaseqs <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT", "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC", "CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC", "GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC", "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC")) names(dnaseqs) <- paste("S", 1:length(dnaseqs), sep="") ## create a distance weighted spectrum kernel with linear decrease of ## weights in a range of 20 bases spec20 <- spectrumKernel(k=3, distWeight=linWeight(sigma=20)) ## show details of kernel object kernelParameters(spec20) ## this kernel can be now be used in a classification or regression task ## in the usual way or a kernel matrix can be generated for use with ## another learning method km <- spec20(x=dnaseqs, selx=1:5) km[1:5,1:5] ## instead of a distance weighting function also a weight vector can be ## passed in the distWeight parameter but the values must be chosen such ## that they lead to a positive definite kernel ## ## in this example only patterns within a 5 base range are considered with ## slightly decreasing weights specv <- spectrumKernel(k=3, distWeight=c(1,0.95,0.9,0.85,0.8)) km <- specv(dnaseqs) km[1:5,1:5] ## position specific spectrum kernel specps <- spectrumKernel(k=3, distWeight=1) km <- specps(dnaseqs) km[1:5,1:5] ## get position specific kernel matrix km <- specps(dnaseqs) km[1:5,1:5] ## example with offset to align sequence positions (e.g. the ## transcription start site), the value gives the offset to position 1 positionOne <- c(9,6,3,1,6) positionMetadata(dnaseqs) <- positionOne ## show position metadata positionMetadata(dnaseqs) ## generate kernel matrix with position-specific spectrum kernel km1 <- specps(dnaseqs) km1[1:5,1:5] ## example for a user defined weighting function ## please stick to the order as described in the comments below and ## make sure that the resulting kernel is positive definite expWeightUserDefined <- function(x, sigma=1) { ## check presence and validity of all parameters except for x if (!isSingleNumber(sigma)) stop("'sigma' must be a number") ## if x is missing the function returns a closure where all parameters ## except for x have a defined value if (missing(x)) return(function(x) expWeightUserDefined(x, sigma=sigma)) ## pattern distance vector x must be numeric if (!is.numeric(x)) stop("'x' must be a numeric vector") ## create vector of distance weights from the ## input vector of pattern distances x exp(-abs(x)/sigma) } ## define kernel object with user defined weighting function specud <- spectrumKernel(k=3, distWeight=expWeightUserDefined(sigma=5), normalized=FALSE)