Learn R Programming

SQMtools (version 1.6.3)

loadSQM: Load a SqueezeMeta project into R

Description

This function takes the path to a project directory generated by SqueezeMeta (whose name is specified in the -p parameter of the SqueezeMeta.pl script) and parses the results into a SQM object. Alternatively, it can load the project data from a zip file produced by sqm2zip.py.

Usage

loadSQM(
  project_path,
  tax_mode = "prokfilter",
  trusted_functions_only = FALSE,
  engine = "data.table"
)

Value

SQM object containing the parsed project.

Arguments

project_path

character, project directory generated by SqueezeMeta, or zip file generated by sqm2zip.py.

tax_mode

character, which taxonomic classification should be loaded? SqueezeMeta applies the identity thresholds described in Luo et al., 2014. Use allfilter for applying the minimum identity threshold to all taxa, prokfilter for applying the threshold to Bacteria and Archaea, but not to Eukaryotes, and nofilter for applying no thresholds at all (default prokfilter).

trusted_functions_only

logical. If TRUE, only highly trusted functional annotations (best hit + best average) will be considered when generating aggregated function tables. If FALSE, best hit annotations will be used (default FALSE). Will only have an effect if the project_dir/results/tables is not already present.

engine

character. Engine used to load the ORFs and contigs tables. Either data.frame or data.table (significantly faster if your project is large). Default data.table.

Prerequisites

Run SqueezeMeta! An example call for running it would be: /path/to/SqueezeMeta/scripts/SqueezeMeta.pl
-m coassembly -f fastq_dir -s samples_file -p project_dir

The SQM object structure

The SQM object is a nested list which contains the following information:

lvl1lvl2lvl3typerows/namescolumnsdata
$orfs$tabledataframeorfsmisc. datamisc. data
$abundnumeric matrixorfssamplesabundances (reads)
$basesnumeric matrixorfssamplesabundances (bases)
$covnumeric matrixorfssamplescoverages
$cpmnumeric matrixorfssamplescovs. / 10^6 reads
$tpmnumeric matrixorfssamplestpm
$seqscharacter vectororfs(n/a)sequences
$taxcharacter matrixorfstax. rankstaxonomy
$contigs$tabledataframecontigsmisc. datamisc. data
$abundnumeric matrixcontigssamplesabundances (reads)
$basesnumeric matrixcontigssamplesabundances (bases)
$covnumeric matrixcontigssamplescoverages
$cpmnumeric matrixcontigssamplescovs. / 10^6 reads
$tpmnumeric matrixcontigssamplestpm
$seqscharacter vectorcontigs(n/a)sequences
$taxcharacter matrixcontigstax. rankstaxonomies
$binscharacter matrixcontigsbin. methodsbins
$bins$tabledataframebinsmisc. datamisc. data
$lengthnumeric vectorbins(n/a)length
$abundnumeric matrixbinssamplesabundances (reads)
$percentnumeric matrixbinssamplesabundances (reads)
$basesnumeric matrixbinssamplesabundances (bases)
$covnumeric matrixbinssamplescoverages
$cpmnumeric matrixbinssamplescovs. / 10^6 reads
$taxcharacter matrixbinstax. rankstaxonomy
$taxa$superkingdom$abundnumeric matrixsuperkingdomssamplesabundances (reads)
$percentnumeric matrixsuperkingdomssamplespercentages
$phylum$abundnumeric matrixphylasamplesabundances (reads)
$percentnumeric matrixphylasamplespercentages
$class$abundnumeric matrixclassessamplesabundances (reads)
$percentnumeric matrixclassessamplespercentages
$order$abundnumeric matrixorderssamplesabundances (reads)
$percentnumeric matrixorderssamplespercentages
$family$abundnumeric matrixfamiliessamplesabundances (reads)
$percentnumeric matrixfamiliessamplespercentages
$genus$abundnumeric matrixgenerasamplesabundances (reads)
$percentnumeric matrixgenerasamplespercentages
$species$abundnumeric matrixspeciessamplesabundances (reads)
$percentnumeric matrixspeciessamplespercentages
$functions$KEGG$abundnumeric matrixKEGG idssamplesabundances (reads)
$basesnumeric matrixKEGG idssamplesabundances (bases)
$covnumeric matrixKEGG idssamplescoverages
$cpmnumeric matrixKEGG idssamplescovs. / 10^6 reads
$tpmnumeric matrixKEGG idssamplestpm
$copy_numbernumeric matrixKEGG idssamplesavg. copies
$COG$abundnumeric matrixCOG idssamplesabundances (reads)
$basesnumeric matrixCOG idssamplesabundances (bases)
$covnumeric matrixCOG idssamplescoverages
$cpmnumeric matrixCOG idssamplescovs. / 10^6 reads
$tpmnumeric matrixCOG idssamplestpm
$copy_numbernumeric matrixCOG idssamplesavg. copies
$PFAM$abundnumeric matrixPFAM idssamplesabundances (reads)
$basesnumeric matrixPFAM idssamplesabundances (bases)
$covnumeric matrixPFAM idssamplescoverages
$cpmnumeric matrixPFAM idssamplescovs. / 10^6 reads
$tpmnumeric matrixPFAM idssamplestpm
$copy_numbernumeric matrixPFAM idssamplesavg. copies
$total_readsnumeric vectorsamples(n/a)total reads
$misc$project_namecharacter vector(empty)(n/a)project name
$samplescharacter vector(empty)(n/a)samples
$tax_names_long$superkingdomcharacter vectorshort names(n/a)full names
$phylumcharacter vectorshort names(n/a)full names
$classcharacter vectorshort names(n/a)full names
$ordercharacter vectorshort names(n/a)full names
$familycharacter vectorshort names(n/a)full names
$genuscharacter vectorshort names(n/a)full names
$speciescharacter vectorshort names(n/a)full names
$tax_names_shortcharacter vectorfull names(n/a)short names
$KEGG_namescharacter vectorKEGG ids(n/a)KEGG names
$KEGG_pathscharacter vectorKEGG ids(n/a)KEGG hiararchy
$COG_namescharacter vectorCOG ids(n/a)COG names
$COG_pathscharacter vectorCOG ids(n/a)COG hierarchy
$ext_annot_sourcescharacter vectorCOG ids(n/a)external databases
If external databases for functional classification were provided to SqueezeMeta via the -extdb argument, the corresponding abundance (reads and bases), coverages, tpm and copy number profiles will be present in SQM$functions (e.g. results for the CAZy database would be present in SQM$functions$CAZy). Additionally, the extended names of the features present in the external database will be present in SQM$misc (e.g. SQM$misc$CAZy_names).