phyloseq-package
.
Originally, QIIME produced its own custom format table
that contained both OTU-abundance and taxonomic identity
information. This function is still included in phyloseq
mainly to accommodate these now-outdated files. Recent
versions of QIIME store output in the biom-format, an
emerging file format standard for microbiome data. If
your data is in the biom-format, if it ends with a
.biom
file name extension, then you should use the
import_biom
function instead.
import_qiime(otufilename = NULL, mapfilename = NULL, treefilename = NULL, refseqfilename = NULL, refseqFunction = readDNAStringSet, refseqArgs = NULL, parseFunction = parse_taxonomy_qiime, verbose = TRUE, ...)
NULL
.NULL
.NULL
. A file representing a phylogenetic tree or a
phylo
object. Files can be NEXUS or Newick
format. See read_tree
for more details.
Also, if using a recent release of the GreenGenes
database tree, try the read_tree_greengenes
function -- this should solve some issues specific to
importing that tree. If provided, the tree should have
the same OTUs/tip-labels as the OTUs in the other files.
Any taxa or samples missing in one of the files is
removed from all. As an example from the QIIME pipeline,
this tree would be a tree of the representative 16S rRNA
sequences from each OTU cluster, with the number of
leaves/tips equal to the number of taxa/species/OTUs, or
the complete reference database tree that contains the
OTU identifiers of every OTU in your abundance table.
Note that this argument can be a tree object
(phylo
-class) for cases where the tree
has been --- or needs to be --- imported separately, as
in the case of the GreenGenes tree mentioned earlier
(coderead_tree_greengenes).NULL
.
The file path of the biological sequence file that
contains at a minimum a sequence for each OTU in the
dataset. Alternatively, you may provide an
already-imported XStringSet
object that satisfies this condition. In either case, the
names
of each OTU need to match exactly the
taxa_names
of the other components of your
data. If this is not the case, for example if the data
file is a FASTA format but contains additional
information after the OTU name in each sequence header,
then some additional parsing is necessary, which you can
either perform separately before calling this function,
or describe explicitly in a custom function provided in
the (next) argument, refseqFunction
. Note that the
XStringSet
class can represent
any arbitrary sequence, including user-defined
subclasses, but is most-often used to represent RNA, DNA,
or amino acid sequences. The only constraint is that this
special list of sequences has exactly one named element
for each OTU in the dataset.readDNAStringSet
, which expects
to read a fasta-formatted DNA sequence file. If your
reference sequences for each OTU are amino acid, RNA, or
something else, then you will need to specify a different
function here. This is the function used to read the file
connection provided as the the previous argument,
refseqfilename
. This argument is ignored if
refseqfilename
is already a
XStringSet
class.NULL
.
Additional arguments to refseqFunction
. See
XStringSet-io
for details about
additional arguments to the standard read functions in
the Biostrings package.parse_taxonomy_qiime
,
specialized for splitting the ";"
-delimited
strings and also attempting to interpret greengenes
prefixes, if any, as that is a common format of the
taxonomy string produced by QIIME.read_tree
phyloseq-class
object.
sample_data-class
component data
type in the phyloseq-package. QIIME may also produce a
phylogenetic tree with a tip for each OTU, which can also
be imported specified here or imported separately using
read_tree
.See "http://www.qiime.org/" for details on using QIIME. While there are many complex dependencies, QIIME can be downloaded as a pre-installed linux virtual machine that runs ``off the shelf''.
The different files useful for import to phyloseq are not collocated in a typical run of the QIIME pipeline. See the main phyloseq vignette for an example of where ot find the relevant files in the output directory.
``QIIME allows analysis of high-throughput community sequencing data.'' J Gregory Caporaso, Justin Kuczynski, Jesse Stombaugh, Kyle Bittinger, Frederic D Bushman, Elizabeth K Costello, Noah Fierer, Antonio Gonzalez Pena, Julia K Goodrich, Jeffrey I Gordon, Gavin A Huttley, Scott T Kelley, Dan Knights, Jeremy E Koenig, Ruth E Ley, Catherine A Lozupone, Daniel McDonald, Brian D Muegge, Meg Pirrung, Jens Reeder, Joel R Sevinsky, Peter J Turnbaugh, William A Walters, Jeremy Widmann, Tanya Yatsunenko, Jesse Zaneveld and Rob Knight; Nature Methods, 2010; doi:10.1038/nmeth.f.303
phyloseq
otufile <- system.file("extdata", "GP_otu_table_rand_short.txt.gz", package="phyloseq")
mapfile <- system.file("extdata", "master_map.txt", package="phyloseq")
trefile <- system.file("extdata", "GP_tree_rand_short.newick.gz", package="phyloseq")
import_qiime(otufile, mapfile, trefile)
Run the code above in your browser using DataLab