GenomeInfoDb (version 1.8.3)

seqinfo: Accessing/modifying sequence information

Description

A set of generic functions for getting/setting/modifying the sequence information stored in an object.

Usage

seqinfo(x) seqinfo(x, new2old=NULL, force=FALSE) <- value
seqnames(x) seqnames(x) <- value
seqlevels(x) seqlevels(x, force=FALSE) <- value sortSeqlevels(x, X.is.sexchrom=NA) seqlevelsInUse(x) seqlevels0(x)
seqlengths(x) seqlengths(x) <- value
isCircular(x) isCircular(x) <- value
genome(x) genome(x) <- value

Arguments

x
The object from/on which to get/set the sequence information.
new2old
The new2old argument allows the user to rename, drop, add and/or reorder the "sequence levels" in x.

new2old can be NULL or an integer vector with one element per row in Seqinfo object value (i.e. new2old and value must have the same length) describing how the "new" sequence levels should be mapped to the "old" sequence levels, that is, how the rows in value should be mapped to the rows in seqinfo(x). The values in new2old must be >= 1 and <= length(seqinfo(x)). NAs are allowed and indicate sequence levels that are being added. Old sequence levels that are not represented in new2old will be dropped, but this will fail if those levels are in use (e.g. if x is a GRanges object with ranges defined on those sequence levels) unless force=TRUE is used (see below).

If new2old=NULL, then sequence levels can only be added to the existing ones, that is, value must have at least as many rows as seqinfo(x) (i.e. length(values) >= length(seqinfo(x))) and also seqlevels(values)[seq_len(length(seqlevels(x)))] must be identical to seqlevels(x).

force
Force dropping sequence levels currently in use. This is achieved by dropping the elements in x where those levels are used (hence typically reducing the length of x).

Note that if x is a list-like object (e.g. GRangesList, GAlignmentPairs, or GAlignmentsList), then any list element in x where at least one of the sequence levels to drop is used is fully dropped. In other words, the seqlevels setter always keeps or drops full list elements and never tries to change their content. This guarantees that the geometry of the list elements is preserved, which is a desirable property when they represent compound features (e.g. exons grouped by transcript or paired-end reads). See below for an example.

value
Typically a Seqinfo object for the seqinfo setter.

Either a named or unnamed character vector for the seqlevels setter.

A vector containing the sequence information to store for the other setters.

X.is.sexchrom
A logical indicating whether X refers to the sexual chromosome or to chromosome with Roman Numeral X. If NA, sortSeqlevels does its best to "guess".

Details

The Seqinfo class plays a central role for the functions described in this man page because:

  • All these functions (except seqinfo, seqlevelsInUse, and seqlevels0) work on a Seqinfo object.

  • For classes that implement it, the seqinfo getter should return a Seqinfo object.
  • Default seqlevels, seqlengths, isCircular, and genome getters and setters are provided. By default, seqlevels(x) does seqlevels(seqinfo(x)), seqlengths(x) does seqlengths(seqinfo(x)), isCircular(x) does isCircular(seqinfo(x)), and genome(x) does genome(seqinfo(x)). So any class with a seqinfo getter will have all the above getters work out-of-the-box. If, in addition, the class defines a seqinfo setter, then all the corresponding setters will also work out-of-the-box.
  • Examples of containers that have a seqinfo getter and setter: the GRanges and GRangesList classes in the GenomicRanges package; the SummarizedExperiment class in the SummarizedExperiment package; the GAlignments, GAlignmentPairs, and GAlignmentsList classes in the GenomicAlignments package; the TxDb class in the GenomicFeatures package; the BSgenome class in the BSgenome package; etc...

    The GenomicRanges package defines seqinfo and seqinfo<- methods for these low-level data types: List, RangesList and RangedData. Those objects do not have the means to formally store sequence information. Thus, the wrappers simply store the Seqinfo object within metadata(x). Initially, the metadata is empty, so there is some effort to generate a reasonable default Seqinfo. The names of any List are taken as the seqnames, and the universe of RangesList or RangedData is taken as the genome.

    See Also

    Examples

    Run this code
    ## ---------------------------------------------------------------------
    ## A. MODIFY THE SEQLEVELS OF AN OBJECT
    ## ---------------------------------------------------------------------
    ## Overlap and matching operations between objects require matching
    ## seqlevels. Often the seqlevels in one must be modified to match 
    ## the other. The seqlevels() function can rename, drop, add and reorder 
    ## seqlevels of an object. Examples below are shown on TxDb 
    ## and GRanges but the approach is the same for all objects that have
    ## a 'Seqinfo' class.
    
    library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
    txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
    seqlevels(txdb)
    
    ## Rename:
    seqlevels(txdb) <- sub("chr", "", seqlevels(txdb))
    seqlevels(txdb)
    
    seqlevels(txdb) <- paste0("CH", seqlevels(txdb))
    seqlevels(txdb)
    
    seqlevels(txdb)[seqlevels(txdb) == "CHM"] <- "M"
    seqlevels(txdb)
    
    gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))
    
    ## Add:
    seqlevels(gr) <- c("chr1", seqlevels(gr), "chr4")
    seqlevels(gr)
    seqlevelsInUse(gr)
    
    ## Reorder:
    seqlevels(gr) <- rev(seqlevels(gr))
    seqlevels(gr)
    
    ## Drop all unused seqlevels:
    seqlevels(gr) <- seqlevelsInUse(gr)
    
    ## Drop some seqlevels in use:
    seqlevels(gr, force=TRUE) <- setdiff(seqlevels(gr), "chr3")
    gr
    
    ## Rename/Add/Reorder:
    seqlevels(gr) <- c("chr1", chr2="chr2", chrM="Mitochondrion")
    seqlevels(gr)
    
    ## ---------------------------------------------------------------------
    ## B. DROP SEQLEVELS OF A LIST-LIKE OBJECT
    ## ---------------------------------------------------------------------
    
    grl0 <- GRangesList(GRanges("chr2", IRanges(3:2, 5)),
                        GRanges("chr5", IRanges(11, 18)),
                        GRanges(c("chr4", "chr2"), IRanges(7:6, 15)))
    grl0
    
    grl1 <- grl0
    seqlevels(grl1, force=TRUE) <- c("chr2", "chr5")
    grl1  # grl0[[3]] was fully dropped!
    
    ## If what is desired is to drop the first range in grl0[[3]] only, or,
    ## more generally speaking, to drop the ranges within each list element
    ## that are located on one of the seqlevels to drop, then do:
    grl2 <- grl0[seqnames(grl0) %in% c("chr2", "chr5")]
    grl2
    
    ## Note that the above subsetting doesn't drop any seqlevel:
    seqlevels(grl2)
    
    ## To drop them (no need to use 'force=TRUE' anymore):
    seqlevels(grl2) <- c("chr2", "chr5")
    seqlevels(grl2)
    
    ## ---------------------------------------------------------------------
    ## C. SORT SEQLEVELS IN "NATURAL" ORDER
    ## ---------------------------------------------------------------------
    
    sortSeqlevels(c("11", "Y", "1", "10", "9", "M", "2"))
    
    seqlevels <- c("chrXI", "chrY", "chrI", "chrX", "chrIX", "chrM", "chrII")
    sortSeqlevels(seqlevels)
    sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
    sortSeqlevels(seqlevels, X.is.sexchrom=FALSE)
    
    seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
                   "chrM", "chrXHet", "chr2LHet", "chrU",
                   "chr3L", "chr3R", "chr2R", "chrX")
    sortSeqlevels(seqlevels)
    
    gr <- GRanges()
    seqlevels(gr) <- seqlevels
    sortSeqlevels(gr)
    
    ## ---------------------------------------------------------------------
    ## D. SUBSET OBJECTS BY SEQLEVELS
    ## ---------------------------------------------------------------------
    
    tx <- transcripts(txdb)
    seqlevels(tx)
    
    ## Drop 'M', keep all others.
    seqlevels(tx, force=TRUE) <- seqlevels(tx)[seqlevels(tx) != "M"]
    seqlevels(tx)
    
    ## Drop all except 'ch3L' and 'ch3R'.
    seqlevels(tx, force=TRUE) <- c("ch3L", "ch3R")
    seqlevels(tx)
    
    ## ---------------------------------------------------------------------
    ## E. RESTORE ORIGINAL SEQLEVELS OF A TxDb OBJECT
    ## ---------------------------------------------------------------------
    
    ## Applicable to TxDb objects only.
    ## Not run: 
    # seqlevels(txdb) <- seqlevels0(txdb)
    # seqlevels(txdb)
    # ## End(Not run)
    
    ## ---------------------------------------------------------------------
    ## F. FINDING METHODS
    ## ---------------------------------------------------------------------
    
    showMethods("seqinfo")
    showMethods("seqinfo<-")
    
    showMethods("seqnames")
    showMethods("seqnames<-")
    
    showMethods("seqlevels")
    showMethods("seqlevels<-")
    
    if (interactive()) {
      library(GenomicRanges)
      ?`GRanges-class`
    }
    

    Run the code above in your browser using DataCamp Workspace