set.CADDregions: Variants annotation based on 'CADD regions' and genomic categories

Description

Attributes CADD regions and genomic categories to variants based on their positions

Usage

set.CADDregions(x, verbose = T, path.data, build = c("b37", "b38"))

Value

The same bed matrix as x with three additional columns :

genomic.region: The CADD region of each variant
SubRegion: The genomic category of each variant among 'Coding', 'Regulatory' or 'Intergenic'
adjCADD.Median: The median of adjusted CADD of variants observed at least to times in GnomAD genomes r2.0.1

Arguments

x: A bed.matrix
verbose: Whether to display information about the function actions
path.data: The repository where data for RAVA-FIRST are or will be downloaded from https://lysine.univ-brest.fr/RAVA-FIRST/
build: The build of the data, either "b37" or "b38". The CADD Regions in the corresponding build will be considered

Details

To attribute variants to CADD regions and genomic categories, the files "CADDRegions.2021.hg19.bed.gz" and "FunctionalAreas.hg19.bed.gz" will be downloaded from https://lysine.univ-brest.fr/RAVA-FIRST/ in the repository of the package Ravages. CADD regions are non-overlapping regions that have been defined in the whole genome to perform rare variant association tests in the RAVA.FIRST() pipeline. It is recommended to use this function chromosome by chromosome for large datasets for time and memory managment.

Examples

Run this code

#Import data in a bed matrix (example in build 37)
#x <- read.bed.matrix( system.file("extdata", "LCT.EUR.b37.bed", package="Ravages") )

#Group variants within CADD regions and genomic categories
#x <- set.CADDregions(x, build = "b37")
#table(x@snps$genomic.region) #CADD regions
#table(x@snps$SubRegion) #Genomic categories