Learn R Programming

similaRpeak (version 1.4.2)

similarity: Calculate metrics which estimate the level of similarity between two ChIP-Seq profiles

Description

Return a list containing information about both ChIP-Seq profiles and a list of all similarity metrics: the ratio of the maximum values, the ratio of the areas, the ratio between the intersection area and the total area (for normalized and non-normalized profiles), the difference between two profiles maximal peaks positions and the Spearman's rho statistic.

Usage

similarity(
        profile1, 
        profile2, 
        ratioAreaThreshold=1, 
        ratioMaxMaxThreshold=1, 
        ratioIntersectThreshold=1,
        ratioNormalizedIntersectThreshold=1,
        diffPosMaxThresholdMinValue=1, 
        diffPosMaxThresholdMaxDiff=100, 
        diffPosMaxTolerance=0.01,
        spearmanCorrSDThreashold=1e-8)

Arguments

profile1
Vector containing the RPM values of the first ChIP-Seq profile for each position of the selected region.
profile2
Vector containing the RPM values of the second ChIP-Seq profile for each position of the selected region.
ratioAreaThreshold
The minimum denominator accepted to calculate the ratio of the area between both profiles. The value has to be positive. Default = 1.
ratioMaxMaxThreshold
The minimum denominator accepted to calculate the ratio of the maximal peaks values between both profiles. The value has to be positive. Default = 1.
ratioIntersectThreshold
The minimum denominator accepted to calculate the ratio of the intersection area of both profiles over the total area. The value has to be positive. Default = 1.
ratioNormalizedIntersectThreshold
The minimum denominator accepted to calculate the ratio of the intersection area of both normalized profiles over the total area. The value has to be positive. Default = 1.
diffPosMaxThresholdMinValue
The minimum peak accepted to calculate the metric. The value has to be positive. Default = 1.
diffPosMaxThresholdMaxDiff
The maximum distance accepted between 2 peaks positions in one profile to calculate the metric. The value has to be positive. Default=100.
diffPosMaxTolerance
The maximum of variation accepted on the maximum value to consider a position as a peak position. The value can be between 0 and 1. Default=0.01.
spearmanCorrSDThreashold
The minimum standard deviation accepted on both profiles to calculate the metric. Default=1e-8.

Value

  • similarity returns a list which contains the information about both ChIP-Seq profiles and a list of all metrics. The data structure is a list of list. The first level contain the following items:
    • nbrPosition: The number of positions included in each profile.
    • areaProfile1: The area of the first profile.
    • areaProfile2: The area of the second profile.
    • maxProfile1: The maximum value in the first profile.
    • maxProfile2: The maximum value in the second profile.
    • maxPositionProfile1: The list of positions of the maximum value in the first profile.
    • maxPositionProfile2: The list of positions of the maximum value in the second profile.
    • metrics: Alistwith the following items:
      • RATIO_AREA: The ratio between the areas. The larger value is always divided by the smaller value.NAif minimal threshold is not respected.
      • DIFF_POS_MAX: The difference between the maximal peaks positions. The difference is always the first profile value minus the second profile value.NAis returned if minimal peak value is not respected. A profile can have more than one position with the maximum value. In that case, the median position is used. A threshold argument can be set to consider all positions within a certain range of the maximum value. A threshold argument can also be set to ensure that the distance between two maximum values is not too wide. When this distance is not respected, it is assumed that more than one peak is present in the profile andNAis returned.
      • RATIO_MAX_MAX: The ratio between the maximal peaks values. The first profile is always divided by the second profile.NAif minimal threshold is not respected.
      • RATIO_INTERSECT: The ratio between the intersection area and the total area.NAif minimal threshold is not respected.
      • RATIO_NORMALIZED_INTERSECT: The ratio between the intersection area and the total area of normalized profiles.NAif minimal threshold is not respected.
      • SPEARMAN_CORRELATION: The Spearman's rho statistic between profiles.NAif minimal threshold is not respected or when no complete element pair is present between both profiles.

Details

similarity uses the two vectors passed as arguments to calculate the metrics. When the metric is a ratio, it always verify that the threshold for the denominator is respected. If the threshold is not respected, the metric is assigned the NA value.

See Also

  • MetricFactory{for using a interface to calculate all available metrics separately.}
  • demoProfiles{for more informations about ChIP-Seq profiles present in the demoProfiles data.}

Examples

Run this code
## Defining two CHiP-Seq profiles 
profile1<-c(3,59,6,24,65,34,15,4,53,22,21,12,11)
profile2<-c(15,9,46,44,9,39,27,34,34,4,3,4,2)

## Example usign default thresholds
similarity(profile1, profile2)

## Example using customised thresholds
similarity(profile1, profile2, 
                        ratioAreaThreshold=5, 
                        ratioMaxMaxThreshold=5, 
                        ratioIntersectThreshold=12,
                        ratioNormalizedIntersectThreshold=2.2,
                        diffPosMaxThresholdMinValue=2, 
                        diffPosMaxThresholdMaxDiff=130, 
                        diffPosMaxTolerance=0.03,
                        spearmanCorrSDThreashold=1e-3)

## Example using ChIP-Seq profiles of H3K27ac (DCC accession: ENCFF000ASG) 
## and H3K4me1 (DCC accession: ENCFF000ARY) from the Encyclopedia of DNA  
## Elements (ENCODE) for the region 
data(demoProfiles)

## Visualize ChIP-Seq profiles 
plot(demoProfiles$chr2.70360770.70361098$H3K27ac,
        type="l", col="blue", xlab="", ylab="", ylim=c(0, 25),
        main="chr2:70360770-70361098")
par(new=TRUE)
plot(demoProfiles$chr2.70360770.70361098$H3K4me1,
        type="l", col="darkgreen", xlab="Position", 
        ylab="Coverage in reads per million (RPM)",  ylim=c(0, 25))
legend("topright", c("H3K27ac","H3K4me1"), cex=1.2, 
        col=c("blue","darkgreen"), lty=1)

# Calculate metrics
similarity(demoProfiles$chr2.70360770.70361098$H3K4me1, 
            demoProfiles$chr2.70360770.70361098$H3K27ac, 
            ratioAreaThreshold=15, 
            ratioMaxMaxThreshold=5, 
            ratioIntersectThreshold=12,
            ratioNormalizedIntersectThreshold=2.2,
            diffPosMaxThresholdMinValue=2, 
            diffPosMaxThresholdMaxDiff=130, 
            diffPosMaxTolerance=0.03,
            spearmanCorrSDThreashold=0.1)


## You can refer to the vignette to see more examples using ChIP-Seq profiles
## extracted from the Encyclopedia of DNA Elements (ENCODE) data.

Run the code above in your browser using DataLab