This function takes the path to a project directory generated by SqueezeMeta (whose name is specified in the -p
parameter of the SqueezeMeta.pl script) and parses the results into a SQM object. Alternatively, it can load the project data from a zip file produced by sqm2zip.py
.
loadSQM(
project_path,
tax_mode = "prokfilter",
trusted_functions_only = FALSE,
engine = "data.table"
)
SQM object containing the parsed project.
character, project directory generated by SqueezeMeta, or zip file generated by sqm2zip.py
.
character, which taxonomic classification should be loaded? SqueezeMeta applies the identity thresholds described in Luo et al., 2014. Use allfilter
for applying the minimum identity threshold to all taxa, prokfilter
for applying the threshold to Bacteria and Archaea, but not to Eukaryotes, and nofilter
for applying no thresholds at all (default prokfilter
).
logical. If TRUE
, only highly trusted functional annotations (best hit + best average) will be considered when generating aggregated function tables. If FALSE
, best hit annotations will be used (default FALSE
). Will only have an effect if the project_dir/results/tables
is not already present.
character. Engine used to load the ORFs and contigs tables. Either data.frame
or data.table
(significantly faster if your project is large). Default data.table
.
Run SqueezeMeta! An example call for running it would be: /path/to/SqueezeMeta/scripts/SqueezeMeta.pl
-m coassembly -f fastq_dir -s samples_file -p project_dir
The SQM object is a nested list which contains the following information:
lvl1 | lvl2 | lvl3 | type | rows/names | columns | data |
$orfs | $table | dataframe | orfs | misc. data | misc. data | |
$abund | numeric matrix | orfs | samples | abundances (reads) | ||
$bases | numeric matrix | orfs | samples | abundances (bases) | ||
$cov | numeric matrix | orfs | samples | coverages | ||
$cpm | numeric matrix | orfs | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | orfs | samples | tpm | ||
$seqs | character vector | orfs | (n/a) | sequences | ||
$tax | character matrix | orfs | tax. ranks | taxonomy | ||
$contigs | $table | dataframe | contigs | misc. data | misc. data | |
$abund | numeric matrix | contigs | samples | abundances (reads) | ||
$bases | numeric matrix | contigs | samples | abundances (bases) | ||
$cov | numeric matrix | contigs | samples | coverages | ||
$cpm | numeric matrix | contigs | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | contigs | samples | tpm | ||
$seqs | character vector | contigs | (n/a) | sequences | ||
$tax | character matrix | contigs | tax. ranks | taxonomies | ||
$bins | character matrix | contigs | bin. methods | bins | ||
$bins | $table | dataframe | bins | misc. data | misc. data | |
$length | numeric vector | bins | (n/a) | length | ||
$abund | numeric matrix | bins | samples | abundances (reads) | ||
$percent | numeric matrix | bins | samples | abundances (reads) | ||
$bases | numeric matrix | bins | samples | abundances (bases) | ||
$cov | numeric matrix | bins | samples | coverages | ||
$cpm | numeric matrix | bins | samples | covs. / 10^6 reads | ||
$tax | character matrix | bins | tax. ranks | taxonomy | ||
$taxa | $superkingdom | $abund | numeric matrix | superkingdoms | samples | abundances (reads) |
$percent | numeric matrix | superkingdoms | samples | percentages | ||
$phylum | $abund | numeric matrix | phyla | samples | abundances (reads) | |
$percent | numeric matrix | phyla | samples | percentages | ||
$class | $abund | numeric matrix | classes | samples | abundances (reads) | |
$percent | numeric matrix | classes | samples | percentages | ||
$order | $abund | numeric matrix | orders | samples | abundances (reads) | |
$percent | numeric matrix | orders | samples | percentages | ||
$family | $abund | numeric matrix | families | samples | abundances (reads) | |
$percent | numeric matrix | families | samples | percentages | ||
$genus | $abund | numeric matrix | genera | samples | abundances (reads) | |
$percent | numeric matrix | genera | samples | percentages | ||
$species | $abund | numeric matrix | species | samples | abundances (reads) | |
$percent | numeric matrix | species | samples | percentages | ||
$functions | $KEGG | $abund | numeric matrix | KEGG ids | samples | abundances (reads) |
$bases | numeric matrix | KEGG ids | samples | abundances (bases) | ||
$cov | numeric matrix | KEGG ids | samples | coverages | ||
$cpm | numeric matrix | KEGG ids | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | KEGG ids | samples | tpm | ||
$copy_number | numeric matrix | KEGG ids | samples | avg. copies | ||
$COG | $abund | numeric matrix | COG ids | samples | abundances (reads) | |
$bases | numeric matrix | COG ids | samples | abundances (bases) | ||
$cov | numeric matrix | COG ids | samples | coverages | ||
$cpm | numeric matrix | COG ids | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | COG ids | samples | tpm | ||
$copy_number | numeric matrix | COG ids | samples | avg. copies | ||
$PFAM | $abund | numeric matrix | PFAM ids | samples | abundances (reads) | |
$bases | numeric matrix | PFAM ids | samples | abundances (bases) | ||
$cov | numeric matrix | PFAM ids | samples | coverages | ||
$cpm | numeric matrix | PFAM ids | samples | covs. / 10^6 reads | ||
$tpm | numeric matrix | PFAM ids | samples | tpm | ||
$copy_number | numeric matrix | PFAM ids | samples | avg. copies | ||
$total_reads | numeric vector | samples | (n/a) | total reads | ||
$misc | $project_name | character vector | (empty) | (n/a) | project name | |
$samples | character vector | (empty) | (n/a) | samples | ||
$tax_names_long | $superkingdom | character vector | short names | (n/a) | full names | |
$phylum | character vector | short names | (n/a) | full names | ||
$class | character vector | short names | (n/a) | full names | ||
$order | character vector | short names | (n/a) | full names | ||
$family | character vector | short names | (n/a) | full names | ||
$genus | character vector | short names | (n/a) | full names | ||
$species | character vector | short names | (n/a) | full names | ||
$tax_names_short | character vector | full names | (n/a) | short names | ||
$KEGG_names | character vector | KEGG ids | (n/a) | KEGG names | ||
$KEGG_paths | character vector | KEGG ids | (n/a) | KEGG hiararchy | ||
$COG_names | character vector | COG ids | (n/a) | COG names | ||
$COG_paths | character vector | COG ids | (n/a) | COG hierarchy | ||
$ext_annot_sources | character vector | COG ids | (n/a) | external databases |
-extdb
argument, the corresponding abundance (reads and bases), coverages, tpm and copy number profiles will be present in SQM$functions
(e.g. results for the CAZy database would be present in SQM$functions$CAZy
). Additionally, the extended names of the features present in the external database will be present in SQM$misc
(e.g. SQM$misc$CAZy_names
).