vs_alignment_classification assigns taxonomy by global
alignment and Last Common Ancestor (LCA) consensus of database hits using
VSEARCH.
vs_alignment_classification(
fastx_input,
database,
lcaout = NULL,
lca_cutoff = 1,
top_hits_only = FALSE,
gapopen = "20I/2E",
gapext = "2I/1E",
id = 0.7,
strand = "plus",
maxaccepts = 2,
maxrejects = 32,
threads = 1,
vsearch_options = NULL,
tmpdir = NULL
)A tibble or NULL.
If lcaout is specified the results are written to the specified file.
If lcaout is NULL a data.frame is returned.
The data.frame contains the classification results for each query sequence.
Both the Header and Sequence columns of fasta_input are
copied into this table, and in addition are also the columns for each rank.
The ranks depend on the database file used, but are typically domain, phylum,
class, order,family, genus and species.
(Required). A FASTA/FASTQ file path or FASTA/FASTQ object. See Details.
(Required). A FASTA/FASTQ file path or FASTA/FASTQ tibble object containing the target sequences.
(Optional). A character string specifying the name of the
output file. If NULL (default), no output is
written to a file and the results are returned as a tibble with the columns
query_id and taxonomy.
(Optional). Adjust the fraction of matching hits required
for the last common ancestor (LCA). Defaults to 1.0, which requires
all hits to match at each taxonomic rank for that rank to be included. If a
lower cutoff value is used, e.g. 0.95, a small fraction of non-matching hits
are allowed while that rank will still be reported. The argument to this
option must be between 0.5 and 1.0.
(Optional). If TRUE, only the top hits with an
equally high percentage of identity between the query and database sequence
sets are written to the output. Defaults to FALSE.
(Optional). Penalties for gap opening. Defaults to
"20I/2E". See Details.
(Optional). Penalties for gap extension. Defaults to
"2I/1E". See Details.
(Optional). Pairwise identity threshold. Defines the minimum
identity required for matches. Defaults to 0.7.
(Optional). Specifies which strand to consider when comparing
sequences. Can be either "plus" (default) or "both".
(Optional). Maximum number of matching target sequences to
accept before stopping the search for a given query. Defaults to 2.
Must be larger than 1 for information to be useful.
(Optional). Maximum number of non-matching target sequences
to consider before stopping the search for a given query. Defaults to 32. If
maxaccepts and maxrejects are both set to 0, the complete
database is searched.
(Optional). Number of computational threads to be used by
VSEARCH. Defaults to 1.
(Optional). Additional arguments to pass to
VSEARCH. Defaults to NULL. See Details.
(Optional). Path to the directory where temporary files should
be written when tables are used as input or output. Defaults to
NULL, which resolves to the session-specific temporary directory
(tempdir()).
Performs global sequence alignment against a reference database and assigns taxonomy using the Last Common Ancestor (LCA) approach, reporting the deepest taxonomic level consistently supported by the majority of hits.
fastx_input and database can either be file paths to a
FASTA/FASTQ files or FASTA/FASTQ objects. FASTA objects are tibbles that
contain the columns Header and Sequence, see
readFasta. FASTQ objects are tibbles that contain the
columns Header, Sequence, and Quality, see
readFastq.
Pairwise identity (id) is calculated as the number of matching columns
divided by the alignment length minus terminal gaps.
vsearch_options allows users to pass additional command-line arguments
to VSEARCH that are not directly supported by this function. Refer to
the VSEARCH manual for more details.
Visit the VSEARCH
documentation
for information about defining gapopen and gapext.
if (FALSE) {
# Example files
db.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"sintax_db.fasta")
fasta.file <- file.path(file.path(path.package("Rsearch"), "extdata"),
"small.fasta")
tax.tbl <- vs_alignment_classification(fastx_input = fasta.file,
database = db.file)
View(tax.tbl)
}
Run the code above in your browser using DataLab