import_biom(BIOMfilename, treefilename=NULL, refseqfilename=NULL, refseqFunction=readDNAStringSet, refseqArgs=NULL, parseFunction=parse_taxonomy_default, parallel=FALSE, version=1.0, ...)
import_biom
,
and then ``merge'' the remaining data after you have
imported with other tools using the relatively
general-purpose data merging function called
merge_phyloseq
.NULL
. A file representing a phylogenetic tree or a
phylo
object. Files can be NEXUS or Newick
format. See read_tree
for more details.
Also, if using a recent release of the GreenGenes
database tree, try the read_tree_greengenes
function -- this should solve some issues specific to
importing that tree. If provided, the tree should have
the same OTUs/tip-labels as the OTUs in the other files.
Any taxa or samples missing in one of the files is
removed from all. As an example from the QIIME pipeline,
this tree would be a tree of the representative 16S rRNA
sequences from each OTU cluster, with the number of
leaves/tips equal to the number of taxa/species/OTUs, or
the complete reference database tree that contains the
OTU identifiers of every OTU in your abundance table.
Note that this argument can be a tree object
(phylo
-class) for cases where the tree
has been --- or needs to be --- imported separately, as
in the case of the GreenGenes tree mentioned earlier
(coderead_tree_greengenes).NULL
.
The file path of the biological sequence file that
contains at a minimum a sequence for each OTU in the
dataset. Alternatively, you may provide an
already-imported XStringSet
object that satisfies this condition. In either case, the
names
of each OTU need to match exactly the
taxa_names
of the other components of your
data. If this is not the case, for example if the data
file is a FASTA format but contains additional
information after the OTU name in each sequence header,
then some additional parsing is necessary, which you can
either perform separately before calling this function,
or describe explicitly in a custom function provided in
the (next) argument, refseqFunction
. Note that the
XStringSet
class can represent
any arbitrary sequence, including user-defined
subclasses, but is most-often used to represent RNA, DNA,
or amino acid sequences. The only constraint is that this
special list of sequences has exactly one named element
for each OTU in the dataset.readDNAStringSet
, which expects
to read a fasta-formatted DNA sequence file. If your
reference sequences for each OTU are amino acid, RNA, or
something else, then you will need to specify a different
function here. This is the function used to read the file
connection provided as the the previous argument,
refseqfilename
. This argument is ignored if
refseqfilename
is already a
XStringSet
class.NULL
.
Additional arguments to refseqFunction
. See
XStringSet-io
for details about
additional arguments to the standard read functions in
the Biostrings package.parse_taxonomy_default
. There are many
variations on taxonomic nomenclature, and naming
conventions used to store that information in various
taxonomic databases and phylogenetic assignment
algorithms. A popular database,
http://greengenes.lbl.gov/cgi-bin/nph-index.cgigreengenes,
has its own custom parsing function provided in the
phyloseq package,
parse_taxonomy_greengenes
, and more can be
contributed or posted as code snippets as needed. They
can be custom-defined by a user immediately prior to the
the call to import_biom
, and this is a
suggested first step to take when trouble-shooting
taxonomy-related errors during file import..parallel
parameter in plyr-package
functions. If TRUE
, apply parsing functions in
parallel, using parallel backend provided by
foreach
and its supporting backend
packages. One caveat, plyr-parallelization currently
works most-cleanly with multicore
-like backends
(Mac OS X, Unix?), and may throw warnings for SNOW-like
backends. See the example below for code invoking
multicore-style backend within the doParallel
package. Finally, for many datasets a parallel import should not
be necessary because a serial import will be just as fast
and the import is often only performed one time; after
which the data should be saved as an RData file using the
save
function.
1.0
. Not yet
implemented. Parsing of the biom-format is done mostly by
the biom package now available in CRAN.read_tree
.phyloseq-class
object.
import
# An included example of a rich dense biom file
rich_dense_biom <- system.file("extdata", "rich_dense_otu_table.biom", package="phyloseq")
import_biom(rich_dense_biom, parseFunction=parse_taxonomy_greengenes)
# An included example of a sparse dense biom file
rich_sparse_biom <- system.file("extdata", "rich_sparse_otu_table.biom", package="phyloseq")
import_biom(rich_sparse_biom, parseFunction=parse_taxonomy_greengenes)
# # # Example code for importing large file with parallel backend
# library("doParallel")
# registerDoParallel(cores=6)
# import_biom("my/file/path/file.biom", parseFunction=parse_taxonomy_greengenes, parallel=TRUE)
Run the code above in your browser using DataLab