Learn R Programming

diffHic (version 1.4.2)

cutGenome: Cut up the genome

Description

Perform an in silico restriction digest of a target genome.

Usage

cutGenome(bs, pattern, overhang=4L)

Arguments

bs
a BSgenome object or a character string pointing to a FASTA file
pattern
character string describing the recognition site
overhang
integer scalar specifying the length of the 5' overhang

Value

A GRanges object containing the boundaries of each restriction fragment in the genome.

Warning

If bs is a FASTQ file, the chromosome names in the FASTQ headers will be loaded faithfully by cutGenome. However, many mapping pipelines will drop the rest of the name past the first whitespace when constructing the alignment index. To be safe, users should ensure that the chromosome names in the FASTQ headers consist of one word. Otherwise, there will be a discrepancy between the chromosome names in the output GRanges, and those in the BAM files after alignment.

Details

This function simulates a restriction digestion of a specified genome, given the recognition site and 5' overhang of the cutter The total sequence spanned by each fragment is recorded, including the two sticky ends. No support is currently provided for searching the reverse strand, so the recognition site should be an inverse palindrome.

The genome should be specified as a BSgenome object. However, a character string can also be provided, specifying a FASTA file containing all the reference sequences in a genome. The latter may be necessary to synchronise the fragments with the genome used for alignment.

Note that some of the reported fragments may be physically impossible to form, e.g., for overlapping sites or consecutive sites when overhang==nchar(pattern). Nonetheless, they are still reported to maintain the correspondence between fragments and cut sites. Cleavage sites on the forward strand can be obtained as the start locations of all fragments (excepting the first fragment on each chromosome).

See Also

matchPattern

Examples

Run this code
require(BSgenome.Ecoli.NCBI.20080805)

cutGenome(Ecoli, "AAGCTT", overhang=4L) # HindIII
cutGenome(Ecoli, "CCGCGG", overhang=2L) # SacII
cutGenome(Ecoli, "AGCT", overhang=0L) # AluI

# Trying with FastA files.
x <- system.file("extdata", "fastaEx.fa", package="Biostrings")
cutGenome(x, "AGCT", overhang=2)
cutGenome(x, "AGCT", overhang=4)

Run the code above in your browser using DataLab