Variants are directly annotated with the adjusted CADD scores in the function using the file "AdjustedCADD_v1.4_202108.tsv.gz" downloaded from https://lysine.univ-brest.fr/RAVA-FIRST/ in the repository of the package Ravages or the scores of variants can be provided to variant.scores to gain in computation time (this file should contain 5 columns: the chromosome ('chr'), position ('pos'), reference allele ('A1'), alternative allele ('A2') and adjusted CADD scores ('adjCADD'). As CADD scores are only available for SNVs, only those ones will be kept in the analysis.
If a column 'adjCADD' is already present in x@snps, no annotation will be performed and filtering will be directly on this column.
To use this function, a factor 'genomic.region' corresponding to the CADD regions and a vector 'adjCADD.Median' should be present in the slot x@snps. To obtain those two, use the function set.CADDregions.
Only variants with an adjusted CADD score upper than the median value are kept in the analysis. It is the filtering strategy applied in the RAVA.FIRST() pipeline.
If filter="whole", only the variants having a MAF lower than the threshold in the entire sample are kept.
If filter="controls", only the variants having a MAF lower than the threshold in the controls group are kept.
If filter="any", only the variants having a MAF lower than the threshold in any of the groups are kept.
It is recommended to use this function chromosome by chromosome for large datasets.