Learn R Programming

bedr (version 1.0.0)

get.fasta: Query fasta sequence

Description

Query fasta sequence using bedtools get.fasta

Usage

get.fasta(
	x,
	fasta = NULL,
	bed12 = FALSE,
	strand = FALSE,
	output.fasta = FALSE,
	use.name.field = FALSE,
	check.zero.based = TRUE,
	check.chr = TRUE,
	check.valid = TRUE,
	check.sort = TRUE,
	check.merge = TRUE,
	verbose = TRUE
	)

Arguments

x
region or index
fasta
a fasta file defaults to hg19 human
bed12
should bed12 format be used
strand
strand specific i.e. reverse complement negative.
output.fasta
output a fasta defaults to a data.frame for easier parsing.
use.name.field
should the name field be used for
check.zero.based
check for zero based region
check.chr
check for "chr" prefix
check.valid
check for valid regions i.e. start < end
check.sort
check if region is sorted
check.merge
check if region is merged
verbose
more words

Value

  • A data.frame or fasta. The data.frame has is two columns corresponding to the region and the sequence.

Details

Uses bedtoos getFasta to query a fasta file and load into R as a data.frame for easy parsing.

Note that the hg19 reference genome fasta is large and requires on the order of 4 GB RAM to avoid a segfault happens.

References

http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

Examples

Run this code
if (check.binary("bedtools")) {

# get the sequence for a set of regions as a data.frame
index <- get.example.regions();
a <- index[[1]];
b <- get.fasta(bedr.sort.region(a));

# this time output a fasta
d <- get.fasta(a, output.fasta = TRUE);
writeLines(d[[1]], con = "test.fasta")

# get the region adjacent to a set of mutations in a vcf
clinvar.vcf.example      <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
clinvar <- read.vcf(clinvar.vcf.example)
# note that clinvar uses ncbi fasta which does not use "chr" and codes chrM as MT
clinvar.bed <- data.frame(
	chr = paste0("chr", gsub("MT", "M", clinvar$vcf$CHROM)),
	start = clinvar$vcf$POS - 2,
	end = clinvar$vcf$POS + 1
	)

mutation.triplet <- get.fasta(clinvar.bed, check.chr = FALSE);
}

Run the code above in your browser using DataLab