GenomeInfoDb (version 1.2.4)

seqinfo: Accessing/modifying sequence information

Description

A set of generic functions for getting/setting/modifying the sequence information stored in an object.

Usage

seqinfo(x) seqinfo(x, new2old=NULL, force=FALSE) <- value
seqnames(x) seqnames(x) <- value
seqlevels(x) seqlevels(x, force=FALSE) <- value sortSeqlevels(x, X.is.sexchrom=NA) seqlevelsInUse(x) seqlevels0(x)
seqlengths(x) seqlengths(x) <- value
isCircular(x) isCircular(x) <- value
genome(x) genome(x) <- value

Arguments

x
The object from/on which to get/set the sequence information.
new2old
The new2old argument allows the user to rename, drop, add and/or reorder the "sequence levels" in x.

new2old can be NULL or an integer vector with one element per row in Seqinfo object value (i.e. new2old and value must have the same length) describing how the "new" sequence levels should be mapped to the "old" sequence levels, that is, how the rows in value should be mapped to the rows in seqinfo(x). The values in new2old must be >= 1 and <= length(seqinfo(x)). NAs are allowed and indicate sequence levels that are being added. Old sequence levels that are not represented in new2old will be dropped, but this will fail if those levels are in use (e.g. if x is a GRanges object with ranges defined on those sequence levels) unless force=TRUE is used (see below).

If new2old=NULL, then sequence levels can only be added to the existing ones, that is, value must have at least as many rows as seqinfo(x) (i.e. length(values) >= length(seqinfo(x))) and also seqlevels(values)[seq_len(length(seqlevels(x)))] must be identical to seqlevels(x).

force
Force dropping sequence levels currently in use. This is achieved by dropping the elements in x where those levels are used (hence typically reducing the length of x).
value
Typically a Seqinfo object for the seqinfo setter.

Either a named or unnamed character vector for the seqlevels setter.

A vector containing the sequence information to store for the other setters.

X.is.sexchrom
A logical indicating whether X refers to the sexual chromosome or to chromosome with Roman Numeral X. If NA, sortSeqlevels does its best to "guess".

Details

The Seqinfo class plays a central role for the functions described in this man page because:

  • All these functions (except seqinfo, seqlevelsInUse, and seqlevels0) work on a Seqinfo object.

  • For classes that implement it, the seqinfo getter should return a Seqinfo object.
  • Default seqlevels, seqlengths, isCircular, and genome getters and setters are provided. By default, seqlevels(x) does seqlevels(seqinfo(x)), seqlengths(x) does seqlengths(seqinfo(x)), isCircular(x) does isCircular(seqinfo(x)), and genome(x) does genome(seqinfo(x)). So any class with a seqinfo getter will have all the above getters work out-of-the-box. If, in addition, the class defines a seqinfo setter, then all the corresponding setters will also work out-of-the-box.
  • Examples of containers that have a seqinfo getter and setter: the GRanges, GRangesList, and SummarizedExperiment classes in the GenomicRanges package; the GAlignments, GAlignmentPairs, and GAlignmentsList classes in the GenomicAlignments package; the TxDb class in the GenomicFeatures package; the BSgenome class in the BSgenome package; etc...

    The GenomicRanges package defines seqinfo and seqinfo<- methods for these low-level data types: List, RangesList and RangedData. Those objects do not have the means to formally store sequence information. Thus, the wrappers simply store the Seqinfo object within metadata(x). Initially, the metadata is empty, so there is some effort to generate a reasonable default Seqinfo. The names of any List are taken as the seqnames, and the universe of RangesList or RangedData is taken as the genome.

    See Also

    Examples

    Run this code
    ## ---------------------------------------------------------------------
    ## Finding methods.
    ## ---------------------------------------------------------------------
    
    showMethods("seqinfo")
    showMethods("seqinfo<-")
    
    showMethods("seqnames")
    showMethods("seqnames<-")
    
    showMethods("seqlevels")
    showMethods("seqlevels<-")
    
    if (interactive()) {
      library(GenomicRanges)
      ?`GRanges-class`
    }
    
    ## ---------------------------------------------------------------------
    ## Modify seqlevels of an object.
    ## ---------------------------------------------------------------------
    
    ## Overlap and matching operations between objects require matching
    ## seqlevels. Often the seqlevels in one must be modified to match 
    ## the other. The seqlevels() function can rename, drop, add and reorder 
    ## seqlevels of an object. Examples below are shown on TxDb 
    ## and GRanges but the approach is the same for all objects that have
    ## a 'Seqinfo' class.
    
    library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
    txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
    seqlevels(txdb)
    
    ## Rename:
    seqlevels(txdb) <- sub("chr", "", seqlevels(txdb))
    seqlevels(txdb)
    
    seqlevels(txdb) <- paste0("CH", seqlevels(txdb))
    seqlevels(txdb)
    
    seqlevels(txdb)[seqlevels(txdb) == "CHM"] <- "M"
    seqlevels(txdb)
    
    gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))
    
    ## Add:
    seqlevels(gr) <- c("chr1", seqlevels(gr), "chr4")
    seqlevels(gr)
    seqlevelsInUse(gr)
    
    ## Reorder:
    seqlevels(gr) <- rev(seqlevels(gr))
    seqlevels(gr)
    
    ## Drop all unused seqlevels:
    seqlevels(gr) <- seqlevelsInUse(gr)
    
    ## Drop some seqlevels in use:
    seqlevels(gr, force=TRUE) <- setdiff(seqlevels(gr), "chr3")
    gr
    
    ## Rename/Add/Reorder:
    seqlevels(gr) <- c("chr1", chr2="chr2", chrM="Mitochondrion")
    seqlevels(gr)
    
    ## ---------------------------------------------------------------------
    ## Sort seqlevels in "natural" order
    ## ---------------------------------------------------------------------
    
    sortSeqlevels(c("11", "Y", "1", "10", "9", "M", "2"))
    
    seqlevels <- c("chrXI", "chrY", "chrI", "chrX", "chrIX", "chrM", "chrII")
    sortSeqlevels(seqlevels)
    sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
    sortSeqlevels(seqlevels, X.is.sexchrom=FALSE)
    
    seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
                   "chrM", "chrXHet", "chr2LHet", "chrU",
                   "chr3L", "chr3R", "chr2R", "chrX")
    sortSeqlevels(seqlevels)
    
    gr <- GRanges()
    seqlevels(gr) <- seqlevels
    sortSeqlevels(gr)
    
    ## ---------------------------------------------------------------------
    ## Subset objects by seqlevels. 
    ## ---------------------------------------------------------------------
    
    tx <- transcripts(txdb)
    seqlevels(tx)
    
    ## Drop 'M', keep all others.
    seqlevels(tx, force=TRUE) <- seqlevels(tx)[seqlevels(tx) != "M"]
    seqlevels(tx)
    
    ## Drop all except 'ch3L' and 'ch3R'.
    seqlevels(tx, force=TRUE) <- c("ch3L", "ch3R")
    seqlevels(tx)
    
    ## ---------------------------------------------------------------------
    ## Restore original seqlevels. 
    ## ---------------------------------------------------------------------
    
    ## Applicable to TxDb objects only.
    ## Not run: 
    # seqlevels0(txdb)
    # seqlevels(txdb)
    # ## End(Not run)
    

    Run the code above in your browser using DataLab