matchAnn2Ann: Genomic location matching of two sets of features

Description

Genomic location matching of two sets of features

Usage

matchAnn2Ann(chr1, bpstart1, bpend1, chr2, bpstart2, bpend2, 
method = "distance", maxDist = 10000, minPerc = 0, 
reference = 1, ncpus = 1, verbose=TRUE)

Arguments

chr1

Object of class numeric containing chromosome information of features from set 1.

bpstart1

Object of class numeric containing start base pair information of features from set 1. Of same length as chr1.

bpend1

Object of class numeric containing end base pair information of features from set 1. Of same length as chr1.

chr2

Object of class numeric containing chromosome information of features from set 2.

bpstart2

Object of class numeric containing start base pair information of features from set 2. Of same length as chr2.

bpend2

Object of class numeric containing end base pair information of features from set 2. Of same length as chr2.

method

Matching method to be applied, either "distance" or "overlap". See below for details.

maxDist

Maximum number of bases two features are allowed to be separated for a match. Only used in combination with method="distance".

minPerc

Minimum percentage of overlap between two features required for a match. Only used in combination with method="overlap".

reference

Platform that is taken as a reference in the calculation of the percentage, should equal 1 or two, referring to the platform.

ncpus

Number of cpus to be used in the computation.

verbose

Logical indicator: should intermediate output be printed on the screen?

Value

An object of class list. Each list item is a three-column matrix with the matched features information. The first column contains feature numbers of set 1 in the order as supplied. The second column contains feature numbers of set 2 in the order as supplied. Each row thus has two entries. The first entry contains the feature number of set 1 that has been matched to second entry, representing the feature number of set 2. The third column contains either the percentage of overlap (method="overlap") or the distance between the the midpoints of the two features (method="distance").

Warning

Base pair information of features from both sets should be on the same scale! Features with incomplete annotation information are removed before matching. For clarity, they are not included in the object with matched features.

Details

The features of set 1 (chr1, bpstart1, bpend1) are matched to the features of set 2 (chr2, bpstart2, bpend2). That is, for every feature in set 2, features in set 1 are sought.

In case method="distance", the midpoint of set 1 and set 2 features are calculated and for each feature of set 2 all features of set 1 with midpoints not further than maxDist are selected. If there are no features in set 1 satisfying this criterion, the feature of set 2 that could not be matched is discarded.

If method="overlap", each feature of set 1 is matched to the feature of set 2 on the basis of the percentage of overlap. All features of set 1 with a percentage exceeding minPerc are selected. In case no feature in set 1 had any overlap with the features from set 2, the feature of set 2 that could not be matched is discarded.

References

Van Wieringen, W.N., Unger, K., Leday, G.G.R., Krijgsman, O., De Menezes, R.X., Ylstra, B., Van de Wiel, M.A. (2012), "Matching of array CGH and gene expression microarray features for the purpose of integrative analysis", BMC Bioinformatics, 13:80.

Examples

Run this code

# load data
data(pollackCN16)
data(pollackGE16)

# extract genomic information from cghCall-object
chr1 <- fData(pollackCN16)[,1]
bpstart1 <- fData(pollackCN16)[,2]
bpend1 <- fData(pollackCN16)[,3]

# extract genomic information from ExpressionSet-object
chr2 <- fData(pollackGE16)[,1]
bpstart2 <- fData(pollackGE16)[,2]
bpend2 <- fData(pollackGE16)[,3]

# match features from both platforms
matchedFeatures <- matchAnn2Ann(chr1, bpstart1, bpend1, chr2, 
	bpstart2, bpend2, method = "distance", maxDist = 10000)

Run the code above in your browser using DataLab