Prepares and indexes gene and lesion data for downstream GRIN (Genomic Random Interval) analysis. This function merges and orders gene and lesion coordinates to support efficient computation of overlaps between genes and all different types of genomic lesions (structural or sequence lesions).
prep.gene.lsn.data(lsn.data, gene.data, mess.freq = 10)A list with the following components:
Original lesion data.
Original gene annotation data.
Combined and ordered data.frame of gene and lesion intervals. The cty column encodes position type: 1 = gene start, 2 = lesion start, 3 = lesion end, 4 = gene end.
Index data.frame indicating the start and end rows for each chromosome within gene.lsn.data for genes.
Index data.frame indicating the start and end rows for each lesion (grouped by type, chromosome, and subject) within gene.lsn.data.
A data.frame containing lesion data in GRIN-compatible format. Must include the following five columns:
Unique patient identifier.
Chromosome on which the lesion is located.
Start position of the lesion in base pairs.
End position of the lesion in base pairs.
Type of lesion (e.g., gain, loss, mutation, fusion, etc...).
A data.frame containing gene annotation data with the following four required columns:
Ensembl gene ID.
Chromosome on which the gene is located.
Start position of the gene in base pairs.
End position of the gene in base pairs.
Integer specifying the frequency at which progress messages are displayed. Messages are printed every mess.freq-th lesion block processed (default is 10).
Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org
This function performs pre-processing by ordering and indexing both gene and lesion data. It combines gene and lesion coordinates into a unified structure, marking each with a specific code (cty) that identifies whether the row represents a gene or lesion. This merged data is then used in the find.gene.lsn.overlaps() function to detect gene-lesion overlaps.
Pounds, S., et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data. Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.
order.index.gene.data, order.index.lsn.data, find.gene.lsn.overlaps
data(lesion_data)
data(hg38_gene_annotation)
# Prepare gene and lesion data for GRIN analysis:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)
Run the code above in your browser using DataLab