This function takes the path to a project directory generated by SqueezeMeta (whose name is specified in the -p
parameter of the SqueezeMeta.pl script) and parses the results into a SQM object. Alternatively, it can load the project data from a zip file produced by sqm2zip.py
.
loadSQM(
project_path,
tax_mode = "prokfilter",
trusted_functions_only = FALSE,
single_copy_genes = "MGOGs",
load_sequences = TRUE,
engine = "data.table"
)
SQM object containing the parsed project. If more than one path is provided in project_path
this function will return a SQMbunch object instead. The structure of this object is similar to that of a SQMlite object (see loadSQMlite
) but with an extra entry named projects
that contains one SQM object for input project. SQM and SQMbunch objects will otherwise behave similarly when used with the subset and plot functions from this package.
character, a vector of project directories generated by SqueezeMeta, and/or zip files generated by sqm2zip.py
.
character, which taxonomic classification should be loaded? SqueezeMeta applies the identity thresholds described in Luo et al., 2014. Use allfilter
for applying the minimum identity threshold to all taxa, prokfilter
for applying the threshold to Bacteria and Archaea, but not to Eukaryotes, and nofilter
for applying no thresholds at all (default prokfilter
).
logical. If TRUE
, only highly trusted functional annotations (best hit + best average) will be considered when generating aggregated function tables. If FALSE
, best hit annotations will be used (default FALSE
). Will only have an effect if project_path
is not a zip file, and project_path/results/tables
is not already present.
character, source of single copy genes for copy number normalization, either RecA
(COG0468, RecA/RadA), MGOGs
(COGs for 10 single copy and housekeeping genes, Salazar, G et al. 2019), MGKOs
(KOs for 10 single copy and housekeeping genes, Salazar, G et al., 2019) or USiCGs
(KOs for 15 single copy genes, Carr et al., 2013. Table S1). For MGOGs
, MGKOs
and USiCGs
, the median coverage of a set of single copy genes will be used for normalization. Default MGOGs
.
logical. If TRUE
, contig and orf sequences will be loaded in the SQM object. Setting it to FALSE
will reduce memory usage. Default TRUE
.
character. Engine used to load the ORFs and contigs tables. Either data.frame
or data.table
(significantly faster if your project is large). Default data.table
.
Run SqueezeMeta! An example call for running it would be: /path/to/SqueezeMeta/scripts/SqueezeMeta.pl
-m coassembly -f fastq_dir -s samples_file -p project_dir
The SQM object is a nested list which contains the following information:
lvl1 | lvl2 | lvl3 | type | rows/names | columns | data |
$orfs | $table | dataframe | orfs | misc. data | misc. data | |
$abund | numeric matrix | orfs | samples | abundances (reads) | ||
$bases | numeric matrix | orfs | samples | abundances (bases) | ||
$cov | numeric matrix | orfs | samples | coverages | ||
$cpm | numeric matrix | orfs | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | orfs | samples | tpm | ||
$seqs | character vector | orfs | (n/a) | sequences | ||
$tax | character matrix | orfs | tax. ranks | taxonomy | ||
$tax16S | character vector | orfs | (n/a) | 16S rRNA taxonomy | ||
$markers | list | orfs | (n/a) | CheckM1 markers | ||
$contigs | $table | dataframe | contigs | misc. data | misc. data | |
$abund | numeric matrix | contigs | samples | abundances (reads) | ||
$bases | numeric matrix | contigs | samples | abundances (bases) | ||
$cov | numeric matrix | contigs | samples | coverages | ||
$cpm | numeric matrix | contigs | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | contigs | samples | tpm | ||
$seqs | character vector | contigs | (n/a) | sequences | ||
$tax | character matrix | contigs | tax. ranks | taxonomies | ||
$bins | character matrix | contigs | bin. methods | bins | ||
$bins | $table | dataframe | bins | misc. data | misc. data | |
$length | numeric vector | bins | (n/a) | length | ||
$abund | numeric matrix | bins | samples | abundances (reads) | ||
$percent | numeric matrix | bins | samples | abundances (reads) | ||
$bases | numeric matrix | bins | samples | abundances (bases) | ||
$cov | numeric matrix | bins | samples | coverages | ||
$cpm | numeric matrix | bins | samples | covs. / 10^6 reads | ||
$tax | character matrix | bins | tax. ranks | taxonomy | ||
$tax_gtdb | character matrix | bins | tax. ranks | GTDB taxonomy | ||
$taxa | $superkingdom | $abund | numeric matrix | superkingdoms | samples | abundances (reads) |
$percent | numeric matrix | superkingdoms | samples | percentages | ||
$phylum | $abund | numeric matrix | phyla | samples | abundances (reads) | |
$percent | numeric matrix | phyla | samples | percentages | ||
$class | $abund | numeric matrix | classes | samples | abundances (reads) | |
$percent | numeric matrix | classes | samples | percentages | ||
$order | $abund | numeric matrix | orders | samples | abundances (reads) | |
$percent | numeric matrix | orders | samples | percentages | ||
$family | $abund | numeric matrix | families | samples | abundances (reads) | |
$percent | numeric matrix | families | samples | percentages | ||
$genus | $abund | numeric matrix | genera | samples | abundances (reads) | |
$percent | numeric matrix | genera | samples | percentages | ||
$species | $abund | numeric matrix | species | samples | abundances (reads) | |
$percent | numeric matrix | species | samples | percentages | ||
$functions | $KEGG | $abund | numeric matrix | KEGG ids | samples | abundances (reads) |
$bases | numeric matrix | KEGG ids | samples | abundances (bases) | ||
$cov | numeric matrix | KEGG ids | samples | coverages | ||
$cpm | numeric matrix | KEGG ids | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | KEGG ids | samples | tpm | ||
$copy_number | numeric matrix | KEGG ids | samples | avg. copies | ||
$COG | $abund | numeric matrix | COG ids | samples | abundances (reads) | |
$bases | numeric matrix | COG ids | samples | abundances (bases) | ||
$cov | numeric matrix | COG ids | samples | coverages | ||
$cpm | numeric matrix | COG ids | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | COG ids | samples | tpm | ||
$copy_number | numeric matrix | COG ids | samples | avg. copies | ||
$PFAM | $abund | numeric matrix | PFAM ids | samples | abundances (reads) | |
$bases | numeric matrix | PFAM ids | samples | abundances (bases) | ||
$cov | numeric matrix | PFAM ids | samples | coverages | ||
$cpm | numeric matrix | PFAM ids | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | PFAM ids | samples | tpm | ||
$copy_number | numeric matrix | PFAM ids | samples | avg. copies | ||
$total_reads | numeric vector | samples | (n/a) | total reads | ||
$misc | $project_name | character vector | (empty) | (n/a) | project name | |
$samples | character vector | (empty) | (n/a) | samples | ||
$tax_names_long | $superkingdom | character vector | short names | (n/a) | full names | |
$phylum | character vector | short names | (n/a) | full names | ||
$class | character vector | short names | (n/a) | full names | ||
$order | character vector | short names | (n/a) | full names | ||
$family | character vector | short names | (n/a) | full names | ||
$genus | character vector | short names | (n/a) | full names | ||
$species | character vector | short names | (n/a) | full names | ||
$tax_names_short | character vector | full names | (n/a) | short names | ||
$KEGG_names | character vector | KEGG ids | (n/a) | KEGG names | ||
$KEGG_paths | character vector | KEGG ids | (n/a) | KEGG hiararchy | ||
$COG_names | character vector | COG ids | (n/a) | COG names | ||
$COG_paths | character vector | COG ids | (n/a) | COG hierarchy | ||
$ext_annot_sources | character vector | COG ids | (n/a) | external databases |
-extdb
argument, the corresponding abundance (reads and bases), coverages, tpm and copy number profiles will be present in SQM$functions
(e.g. results for the CAZy database would be present in SQM$functions$CAZy
). Additionally, the extended names of the features present in the external database will be present in SQM$misc
(e.g. SQM$misc$CAZy_names
).