odseq_unaligned: Outlier detection provided a distance/similarity matrix of sequences.
Description
Provided a similarity matrix (like the ones provided using string kernels in kebabs). It will then compute a score for each sequence and perform bootstrap to provide information on the distribution of the scores, which is used to distinguish outlier sequences.
Usage
odseq_unaligned(distance_matrix, B = 100, threshold = 0.025, type = "similarity")
Arguments
distance_matrix
A numeric matrix representing either similarity or distance among unaligned sequences. Package kebabs may be useful for this task.
B
Integer indicating the number of bootstrap replicates to be run. The higher the more robust the detection should be.
threshold
Float indicating the probability to be left at the right of the bootstrap scores distribution when computing outliers. This parameter may need some tuning depending on each specific problem
type
A string indicating the type of distance metric used. Either 'similarity' or 'distance'.
Value
Returns a logical vector, where TRUE indicates an outlier.
References
[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.