BSgenome (version 1.40.1)

SNPlocs-class: SNPlocs objects

Description

The SNPlocs class is a container for storing known SNP locations for a given organism. SNPlocs objects are usually made in advance by a volunteer and made available to the Bioconductor community as "SNPlocs data packages". See ?available.SNPs for how to get the list of "SNPlocs data packages" curently available.

This man page's main focus is on how to extract information from a SNPlocs object.

Usage

snpcount(x)
snpsBySeqname(x, seqnames, ...) "snpsBySeqname"(x, seqnames, drop.rs.prefix=FALSE)
snpsByOverlaps(x, ranges, maxgap=0L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), ...) "snpsByOverlaps"(x, ranges, maxgap=0L, minoverlap=0L, type=c("any", "start", "end", "within", "equal"), drop.rs.prefix=FALSE, ...)
snpsById(x, ids, ...) "snpsById"(x, ids, ifnotfound=c("error", "warning", "drop"))
## Old API ## ------------------------------------
snplocs(x, seqname, ...) "snplocs"(x, seqname, as.GRanges=FALSE, caching=TRUE)
snpid2loc(x, snpid, ...) "snpid2loc"(x, snpid, caching=TRUE)
snpid2alleles(x, snpid, ...) "snpid2alleles"(x, snpid, caching=TRUE)
snpid2grange(x, snpid, ...) "snpid2grange"(x, snpid, caching=TRUE)

Arguments

x
A SNPlocs object.
seqnames
The names of the sequences for which to get SNPs. Must be a subset of seqlevels(x). NAs and duplicates are not allowed.
...
Additional arguments, for use in specific methods.

Arguments passed to the snpsByOverlaps method for SNPlocs objects thru ... are passed to internal call to subsetByOverlaps().

drop.rs.prefix
Should the rs prefix be dropped from the returned RefSNP ids? (RefSNP ids are stored in the RefSNP_id metadata column of the returned object.)
ranges
One or more regions of interest specified as a GRanges object. A single region of interest can be specified as a character string of the form "ch14:5201-5300".
maxgap, minoverlap, type
These arguments are passed to subsetByOverlaps() which is used internally by snpsByOverlaps. See ?IRanges::subsetByOverlaps in the IRanges package and ?GenomicRanges::subsetByOverlaps in the GenomicRanges package for more information about the subsetByOverlaps() generic and its method for GenomicRanges objects.
ids, snpid
The RefSNP ids to look up (a.k.a. rs ids). Can be integer or character vector, with or without the "rs" prefix. NAs are not allowed.
ifnotfound
What to do if SNP ids are not found.
seqname
The name of the sequence for which to get the SNP locations and alleles.

If as.GRanges is FALSE, only one sequence can be specified (i.e. seqname must be a single string). If as.GRanges is TRUE, an arbitrary number of sequences can be specified (i.e. seqname can be a character vector of arbitrary length).

as.GRanges
TRUE or FALSE. If TRUE, then the SNP locations and alleles are returned in a GRanges object. Otherwise (the default), they are returned in a data frame.
caching
Should the loaded SNPs be cached in memory for faster further retrieval but at the cost of increased memory usage?

Value

snpcount returns a named integer vector containing the number of SNPs for each sequence in the reference genome.snpsBySeqname, snpsByOverlaps, and snpsById return a GPos object with 1 element (genomic position) per SNP and the following metadata columns:
  • RefSNP_id: RefSNP ID (aka "rs id"). Character vector with no NAs and no duplicates.
  • alleles_as_ambig: A character vector with no NAs containing the alleles for each SNP represented by an IUPAC nucleotide ambiguity code. See ?IUPAC_CODE_MAP in the Biostrings package for more information.
Note that all the elements (genomic positions) in this GRanges object have their strand set to "+".If ifnotfound="error", the object returned by snpsById is guaranteed to be parallel to ids, that is, the i-th element in the GPos object corresponds to the i-th element in ids.Old API Note that snplocs is superseded by snpsBySeqname, and snpid2loc, snpid2alleles, and snpid2grange are superseded by snpsById.By default (i.e. when as.GRanges=FALSE), snplocs returns a data frame with 1 row per SNP and the following columns:
  1. RefSNP_id: Same as above but with "rs" prefix always removed.
  2. alleles_as_ambig: Same as above.
  3. loc: The 1-based location of the SNP relative to the first base at the 5' end of the plus strand of the reference sequence.
Otherwise (i.e. when as.GRanges=TRUE), it returns a GRanges object with metadata columns "RefSNP_id" and "alleles_as_ambig".snpid2loc and snpid2alleles both return a named vector (integer vector for the former, character vector for the latter) where each (name, value) pair corresponds to a supplied SNP id. For both functions the name in (name, value) is the chromosome of the SNP id. The value in (name, value) is the position of the SNP id on the chromosome for snpid2loc, and a single IUPAC code representing the associated alleles for snpid2alleles.snpid2grange returns a GRanges object similar to the one returned by snplocs (when used with as.GRanges=TRUE) and where each element corresponds to a supplied SNP id.

See Also

Examples

Run this code
library(SNPlocs.Hsapiens.dbSNP141.GRCh38)
snps <- SNPlocs.Hsapiens.dbSNP141.GRCh38
snpcount(snps)

## ---------------------------------------------------------------------
## snpsBySeqname()
## ---------------------------------------------------------------------
## Get all SNPs located on chromosome 22 and MT:
snpsBySeqname(snps, c("ch22", "chMT"))

## ---------------------------------------------------------------------
## snpsByOverlaps()
## ---------------------------------------------------------------------
## Get all SNPs overlapping some regions of interest:
snpsByOverlaps(snps, "ch22:33.63e6-33.64e6")

## With the regions of interest being all the known CDS for hg38
## located on chr22 or chrMT (except for the chromosome naming
## convention, hg38 is the same as GRCh38):
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
my_cds <- cds(txdb)
seqlevels(my_cds, force=TRUE) <- c("chr22", "chrMT")
seqlevelsStyle(my_cds)  # UCSC
seqlevelsStyle(snps)  # dbSNP
seqlevelsStyle(my_cds) <- seqlevelsStyle(snps)
genome(my_cds) <- genome(snps)
snpsByOverlaps(snps, my_cds)

## ---------------------------------------------------------------------
## snpsById()
## ---------------------------------------------------------------------
## Lookup some RefSNP ids:
my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
## Not run: 
#   snpsById(snps, my_rsids)  # error, rs7553394 not found
# ## End(Not run)
snpsById(snps, my_rsids, ifnotfound="drop")

Run the code above in your browser using DataCamp Workspace