This function calculates the frequency of methylation differences between pairs of cherry tips in a phylogenetic tree. A cherry is a pair of leaf nodes that share a direct common ancestor. The function quantifies full and half methylation differences for each genomic structure (e.g., island/non-island) across all sites and normalizes these counts by the number of sites per structure to obtain frequencies.
freqSites_cherryMethDiff(
tree,
data,
categorized_data = FALSE,
input_control = TRUE
)
A data frame with one row per cherry, containing the following columns:
A character string representing the names of the two tips in the cherry, concatenated with a hyphen.
A character string representing the indices of the two tips in the cherry, concatenated with a hyphen.
A numeric value representing the sum of the branch distances between the cherry tips.
A numeric value representing the frequency of sites with a full methylation difference (where one tip is methylated and the other is unmethylated) for the given structure.
A numeric value representing the frequency of sites with a half methylation difference (where one tip is partially methylated and the other is either fully methylated or unmethylated) for the given structure.
A phylogenetic tree object. The function assumes it follows an appropriate format for downstream processing.
A list containing methylation states at tree tips for each genomic structure (e.g., island/non-island).
The data should be structured as data[[tip]][[structure]]
, where each structure has the same number of sites across tips.
The input data must be prefiltered to ensure CpG sites are represented consistently across different tips.
Each element contains the methylation states at the sites in a given tip and structure
represented as 0, 0.5 or 1 (for unmethylated, partially-methylated and methylated).
If methylation states are not represented as 0, 0.5, 1 they are categorized
as 0 when value equal or under 0.2
0.5 when value between 0.2 and 0.8
and 1 when value over 0.8.
For customized categorization thresholds use categorize_siteMethSt
Logical defaulted to FALSE. TRUE to skip redundant categorization when methylation states are represented as 0, 0.5, and 1.
A logical value indicating whether to validate the input data.
If TRUE
(default), the function checks that the data has the required structure.
It ensures that the number of tips is sufficient and that the data structure is consistent across tips and structures.
If FALSE
, the function assumes the tree is already valid and skips the validation step.
The function first validates the tree structure and extracts pairwise distances between cherry tips.
It then counts methylation differences using countSites_cherryMethDiff
and normalizes these counts by the number
of sites per structure to compute frequencies. The resulting data frame provides a per-cherry frequency
of methylation differences (half or full difference) across different genomic structures.
# Example data setup
data <- list(
list(rep(1,10), rep(0,5), rep(1,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(rep(1,10), rep(0.5,5), rep(0,8)),
list(c(rep(0,5), rep(0.5, 5)), c(0, 0, 1, 1, 1), c(0.5, 1, rep(0, 6))))
tree <- "((a:1.5,b:1.5):2,(c:2,d:2):1.5);"
freqSites_cherryMethDiff(tree, data, categorized_data = TRUE)
Run the code above in your browser using DataLab