Learn R Programming

hiReadsProcessor (version 1.8.2)

findIntegrations: Find the integration sites and add results to SampleInfo object.

Description

Given a SampleInfo object, the function finds integration sites for each sample using their respective settings and adds the results back to the object. This is an all-in-one function which aligns, finds best hit per read per sample, cluster sites, and assign ISU IDs. It calls blatSeqs, read.psl, getIntegrationSites, clusterSites, otuSites. here must be linkered reads within the sampleInfo object in order to use this function using the default parameters. If you are planning on BLATing non-linkered reads, then change the seqType to one of accepted options for the 'feature' parameter of extractSeqs, except for '!' based features.

Usage

findIntegrations(sampleInfo, seqType = NULL, genomeIndices = NULL,
  samplenames = NULL, parallel = TRUE, autoOptimize = FALSE,
  doSonic = FALSE, doISU = FALSE, ...)

Arguments

sampleInfo
sample information SimpleList object outputted from findLinkers, which holds decoded, primed, LTRed, and Linkered sequences for samples per sector/quadrant along with metadata.
seqType
which type of sequence to align and find integration sites. Default is NULL and determined automatically based on type of restriction enzyme or isolation method used. If restriction enzyme is Fragmentase, MuA, Sonication, or Sheared then this parameter is set to genomicLinkered, else it is genomic. Any one of following options are valid: genomic, genomicLinkered, decoded, primed, LTRed, linkered.
genomeIndices
an associative character vector of freeze to full or relative path of respective of indexed genomes from BLAT(.nib or .2bit files). For example: c("hg18"="/usr/local/blatSuite34/hg18.2bit", "mm8"="/usr/local/blatSuite34/mm8.2bit"). Be sure to supply an index per freeze supplied in the sampleInfo object. Default is NULL.
samplenames
a vector of samplenames to process. Default is NULL, which processes all samples from sampleInfo object.
parallel
use parallel backend to perform calculation with BiocParallel. Defaults to TRUE. If no parallel backend is registered, then a serial version is ran using SerialParam.
autoOptimize
if aligner='BLAT', then should the blatParameters be automatically optimized based on the reads? Default is FALSE. When TRUE, following parameters are adjusted within the supplied blatParameters vector: stepSize, tileSize, minScore, minIdentity. This parameter is useful when aligning reads of various lengths to the genome. Optimization is done using only read lengths. In beta phase!
doSonic
calculate integration sites abundance using breakpoints. See getSonicAbund for more details. Default is FALSE.
doISU
calculate integration site unit for multihits. See isuSites for more details. Default is FALSE.
...
additional parameters to be passed to blatSeqs.

Value

  • a SimpleList object similar to sampleInfo parameter supplied with new data added under each sector and sample. New data attributes include: psl, and sites. The psl attributes holds the genomic hits per read along with QC information. The sites attribute holds the condensed integration sites where genomic hits have been clustered by the Position column and cherry picked to have each site pass all the QC steps.

See Also

findPrimers, findLTRs, findLinkers, startgfServer, read.psl, blatSeqs, blatListedSet, pslToRangedObject, clusterSites, isuSites, crossOverCheck, getIntegrationSites, getSonicAbund, annotateSites

Examples

Run this code
load(file.path(system.file("data", package = "hiReadsProcessor"),
"FLX_seqProps.RData"))
findIntegrations(seqProps,
genomeIndices=c("hg18"="/usr/local/genomeIndexes/hg18.noRandom.2bit"),
numServers=2)

Run the code above in your browser using DataLab