Learn R Programming

bedr (version 1.0.2)

get.fasta: Query fasta sequence

Description

Query fasta sequence using bedtools get.fasta

Usage

get.fasta( x, fasta = NULL, bed12 = FALSE, strand = FALSE, output.fasta = FALSE, use.name.field = FALSE, check.zero.based = TRUE, check.chr = TRUE, check.valid = TRUE, check.sort = TRUE, check.merge = TRUE, verbose = TRUE )

Arguments

x
region or index
fasta
a fasta file defaults to hg19 human
bed12
should bed12 format be used
strand
strand specific i.e. reverse complement negative.
output.fasta
output a fasta defaults to a data.frame for easier parsing.
use.name.field
should the name field be used for
check.zero.based
check for zero based region
check.chr
check for "chr" prefix
check.valid
check for valid regions i.e. start < end
check.sort
check if region is sorted
check.merge
check if region is merged
verbose
more words

Value

A data.frame or fasta. The data.frame has is two columns corresponding to the region and the sequence.

Details

Uses bedtoos getFasta to query a fasta file and load into R as a data.frame for easy parsing.

Note that the hg19 reference genome fasta is large and requires on the order of 4 GB RAM to avoid a segfault happens.

References

http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

Examples

Run this code


if (check.binary("bedtools")) {

## Not run: 
# # get the sequence for a set of regions as a data.frame
# index <- get.example.regions();
# a <- index[[1]];
# b <- get.fasta(bedr.sort.region(a));
# 
# # this time output a fasta
# d <- get.fasta(a, output.fasta = TRUE);
# writeLines(d[[1]], con = "test.fasta")
# 
# # get the region adjacent to a set of mutations in a vcf
# clinvar.vcf.example      <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
# clinvar <- read.vcf(clinvar.vcf.example)
# # note that clinvar uses ncbi fasta which does not use "chr" and codes chrM as MT
# clinvar.bed <- data.frame(
# 	chr = paste0("chr", gsub("MT", "M", clinvar$vcf$CHROM)),
# 	start = clinvar$vcf$POS - 2,
# 	end = clinvar$vcf$POS + 1
# 	)
# 
# mutation.triplet <- get.fasta(clinvar.bed, check.chr = FALSE);
# ## End(Not run)
}

Run the code above in your browser using DataLab