odseq: Outlier detection in a multiple sequence alignment
Description
This function will first compute a distance metric among every sequence in the multiple alignment. Then it will bootstrap an average score of these distance to provide information on the distribution of scores, which is used to distinguish outlier sequences with a certain threshold
Usage
odseq(msa_object, distance_metric = "linear", B = 100, threshold = 0.025)
Arguments
msa_object
An object of formal class MsaAAMultipleAlignment, as provided by the msa package.
distance_metric
A string indicating the type of distance metric to be computed. Either 'linear' and 'affine' is supported at the moment.
B
Integer indicating the number of bootstrap replicates to be run. The higher the more robust the detection should be.
threshold
Float indicating the probability to be left at the right of the bootstrap scores distribution when computing outliers. This parameter may need some tuning depending on each specific problem
Value
Returns a logical vector, where TRUE indicates an outlier.
References
[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.