Learn R Programming

bedr (version 1.0.2)

test.region.similarity: Compare sets of regions via jaccard and relative distance using permutation

Description

Compare sets of regions via jaccard and relative distance using permutation to get an empirical p-value.

Usage

test.region.similarity( x, y, n = 1000, stratify.by.chr = FALSE, species = "human", build = "hg19", mask.gaps = FALSE, mask.repeats = FALSE, check.zero.based = TRUE, check.chr = TRUE, check.valid = TRUE, verbose = TRUE )

Arguments

x
first region to be compared. this is the region that is permuted
y
second region to be compared
n
the number of iterations to permute
stratify.by.chr
Should the permutation be happen separetely for each chromosome. That is are chromosomes exchangeable.
species
species
build
the build of the reference
mask.gaps
should the gaps (Ns) in the human reference be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off.
mask.repeats
should the repeats from repeatMasker be ignored as potential start points. This drammatically increases memory and run time but is more appropriate in almost all settings. By default it's off.
check.zero.based
should 0 based coordinates be checked
check.chr
should chr prefix be checked
check.valid
should the region be checkded for integerity
verbose
should log messages and checking take place

Value

A list of results

Details

Iteratively permutes intervals and recalculates jaccard and reldist statistics.

Examples

Run this code

if (check.binary("bedtools")) {

index <- get.example.regions();

a <- index[[1]];
b <- index[[2]];
a <- bedr(engine = "bedtools", input = list(i = a), method = "sort", params = "");
b <- bedr(engine = "bedtools", input = list(i = b), method = "sort", params = "");

# a simple example
test.region.similarity(a, b, n = 8)

# note you can set the cores available to parallelize
options(cores = 4);
system.time(test.region.similarity(a, b, n = 8));

# a real example comparing the distribution of mRNA vs ncRNA genes in RefSeq
## Not run: 
# 
# # more core
# options(cores = 8);
# 
# # load refgene
# refgene <- query.ucsc("refGene")
# refgene <- refgene[,c("chrom","txStart","txEnd","name","name2","strand")]
# 
# # only include canonical chr
# chr <- paste0("chr", c(1:22,"X","Y")); 
# refgene <- refgene[refgene$chrom 
# 
# # remove genes with multiple positions
# duplicated.gene <- duplicated(refgene$name2) | duplicated(rev(refgene$name2));
# refgene <- refgene[!duplicated.gene,];
# 
# # only select pr coding genes
# refgene.nm <- refgene[grepl("^NM",refgene$name),];
# # only select non protein coding rna genes	
# refgene.nr <- refgene[grepl("^NR",refgene$name),];
# 
# # sort and merge
# refgene.nm <- snm(refgene.nm,check.chr = FALSE);
# refgene.nr <- snm(refgene.nr,check.chr = FALSE);
# 
# test.region.similarity(refgene.nm, refgene.nr, mask.unmapped = TRUE );
# 
# option(core = 1)
# ## End(Not run)

}

Run the code above in your browser using DataLab