prep.gene.lsn.data: Prepare Gene and Lesion Data for GRIN Analysis

Description

Prepares and indexes gene and lesion data for downstream GRIN (Genomic Random Interval) analysis. This function merges and orders gene and lesion coordinates to support efficient computation of overlaps between genes and all different types of genomic lesions (structural or sequence lesions).

Usage

prep.gene.lsn.data(lsn.data, gene.data, mess.freq = 10)

Value

A list with the following components:

lsn.data: Original lesion data.
gene.data: Original gene annotation data.
gene.lsn.data: Combined and ordered data.frame of gene and lesion intervals. The cty column encodes position type: 1 = gene start, 2 = lesion start, 3 = lesion end, 4 = gene end.
gene.index: Index data.frame indicating the start and end rows for each chromosome within gene.lsn.data for genes.
lsn.index: Index data.frame indicating the start and end rows for each lesion (grouped by type, chromosome, and subject) within gene.lsn.data.

Arguments

lsn.data

A data.frame containing lesion data in GRIN-compatible format. Must include the following five columns:

ID: Unique patient identifier.

chrom

Chromosome on which the lesion is located.

loc.start

Start position of the lesion in base pairs.

loc.end

End position of the lesion in base pairs.

lsn.type

Type of lesion (e.g., gain, loss, mutation, fusion, etc...).

gene.data

A data.frame containing gene annotation data with the following four required columns:

gene: Ensembl gene ID.

chrom

Chromosome on which the gene is located.

loc.start

Start position of the gene in base pairs.

loc.end

End position of the gene in base pairs.

mess.freq

Integer specifying the frequency at which progress messages are displayed. Messages are printed every mess.freq-th lesion block processed (default is 10).

Author

Abdelrahman Elsayed abdelrahman.elsayed@stjude.org and Stanley Pounds stanley.pounds@stjude.org

Details

This function performs pre-processing by ordering and indexing both gene and lesion data. It combines gene and lesion coordinates into a unified structure, marking each with a specific code (cty) that identifies whether the row represents a gene or lesion. This merged data is then used in the find.gene.lsn.overlaps() function to detect gene-lesion overlaps.

References

Pounds, S., et al. (2013). A genomic random interval model for statistical analysis of genomic lesion data. Cao, X., Elsayed, A. H., & Pounds, S. B. (2023). Statistical Methods Inspired by Challenges in Pediatric Cancer Multi-omics.

Examples

Run this code

data(lesion_data)
data(hg38_gene_annotation)

# Prepare gene and lesion data for GRIN analysis:
prep.gene.lsn <- prep.gene.lsn.data(lesion_data, hg38_gene_annotation)

Run the code above in your browser using DataLab