".clust"
file or its
apparent sequence naming convention.
import_RDP_cluster(RDP_cluster_file)
".clust"
file produced by the the complete
linkage clustering step of the RDP pipeline.otu_table
object parsed from the
".clust"
file.
http://pyro.cme.msu.edu/index.jsp
The cluster file itself contains the names of all
sequences contained in input alignment. If the upstream
barcode and aligment processing steps are also done with
the RDP pipeline, then the sequence names follow a
predictable naming convention wherein each sequence is
named by its sample and sequence ID, separated by a
"_"
as delimiter:
"sampleName_sequenceIDnumber"
This import function assumes that the sequence names in
the cluster file follow this convention, and that the
sample name does not contain any "_"
. It is
unlikely to work if this is not the case. It is likely to
work if you used the upstream steps in the RDP pipeline
to process your raw (barcoded, untrimmed) fasta/fastq
data.
This function first loops through the ".clust"
file and collects all of the sample names that appear. It
secondly loops through each OTU ("cluster"
; each
row of the cluster file) and sums the number of sequences
(reads) from each sample. The resulting abundance table
of OTU-by-sample is trivially coerced to an
otu_table
object, and returned.