Learn R Programming

podkat (version 1.4.2)

unmaskedRegions: Extract Unmasked Regions from MaskedBSgenome Object

Description

Create a GRangesList of unmasked regions from a MaskedBSgenome object

Usage

unmaskedRegions(x, chrs=character(), pseudoautosomal=NULL,
                ignoreGaps=250, activeMasks=active(masks(x[[1]])))

Arguments

x
chrs
a character vector of chromosome names to restrict to; if empty (default), all chromosomes in x are considered.
pseudoautosomal
if NULL (default), the chromosomes are considered as they are; pseudoautosomal must be a data frame complying with the format of the pseudoautosomal.hg18, pseudoautosomal.hg19, and pseudoautosomal.hg38 from the GWASTools package (see details below).
ignoreGaps
skip assembly gaps only if larger than this threshold; in turn, if two unmasked regions are separated by an assembly gap not larger than ignoreGaps, they are joined in the resuling GRanges object.
activeMasks
masks to apply for determining unmasked region; defaults to the masks that are active by default in the MaskedBSgenome object x. Therefore, this argument only needs to be set if a masking other than the default is necessary.

Value

  • a GRangesList object (see details above)

Details

This function takes a MaskedBSgenome object x and extracts the genomic regions that are unmasked in this genome, where the set of masks to apply can be specified using the activeMasks argument. The result is returned as a GRangesList object each component of which corresponds to one chromosome of the genome x - or a subset thereof if the chrs argument has been specified.

The pseudoautosomal argument allows for a special treatment of pseudoautosomal regions. If not NULL, this argument must be a data frame that contains columns with names chrom, start.base, and end.base. The chrom column must contain chromosome names as they appear in the MaskedBSgenome object x. The columns start.base and end.base must contain numeric values that specify the starts and ends of pseudoautosomal regions, respectively. The function is implemented such that the data frames pseudoautosomal.hg18, pseudoautosomal.hg19, and pseudoautosomal.hg38 provided by the GWASTools package can be used (except for the chromosome names that need to be adapted to hg18/hg19/hg38). If the pseudoautosomal argument is specified correctly, the unmaskedRegions function produces separate components in the resulting GRangesList object - one for each pseudoautosomal region. These components are named as the corresponding row names in the data frame pseudoautosomal. Moreover, these regions are omitted from the list of unmasked regions of the chromosomes they are on.

References

http://www.bioinf.jku.at/software/podkat

See Also

GRangesList, pseudoautosomal

Examples

Run this code
## load packages to obtain masked hg38genome and
##  pseudoautosomal.hg19 from GWASTools package
if (require(BSgenome.Hsapiens.UCSC.hg38.masked) && require(GWASTools))
{
    ## extract unmasked regions of all autosomal chromosomes
    regions <- unmaskedRegions(BSgenome.Hsapiens.UCSC.hg38.masked,
                               chrs=paste0("chr", 1:22))
    names(regions)
    regions$chr1

    ## adjust chromosome names
    pseudoautosomal.hg38
    psaut <- pseudoautosomal.hg38
    psaut$chrom <- paste0("chr", psaut$chrom)
    psaut

    ## extract unmasked regions of sex chromosomes taking pseudoautosomal
    ## regions into account
    regions <- unmaskedRegions(BSgenome.Hsapiens.UCSC.hg38.masked,
                               chrs=c("chrX", "chrY"), pseudoautosomal=psaut)
    names(regions)
    regions$chrX
    regions$X.PAR1

    ## check overlap between X chromosome and a pseudoautosomal region
    intersect(regions$chrX, regions$X.PAR1)
}

Run the code above in your browser using DataLab