The "locus-to-gene" (L2G) model derives features to prioritize likely causal genes at each GWAS locus based on genetic and functional genomics features. The main categories of predictive features are:
Distance: Distance from credible set variants to the gene.
Molecular QTL colocalization: Colocalization with molecular QTLs.
Chromatin interaction: Interactions, such as promoter-capture Hi-C.
Variant pathogenicity: Pathogenicity scores from VEP (Variant Effect Predictor).
studiesAndLeadVariantsForGeneByL2G(gene, l2g = NA, pvalue = NA, vtype = NULL)Returns a data frame containing the input gene ID and its data for the L2G model. The table consists of the following columns:
yProbaModel: Numeric. L2G score.
yProbaDistance: Numeric. Distance.
yProbaInteraction: Numeric. Chromatin interaction.
yProbaMolecularQTL: Numeric. Molecular QTL.
yProbaPathogenicity: Numeric. Pathogenicity.
pval: Numeric. P-value.
beta.direction: Character. Beta direction.
beta.betaCI: Numeric. Beta confidence interval.
beta.betaCILower: Numeric. Lower bound of the beta confidence interval.
beta.betaCIUpper: Numeric. Upper bound of the beta confidence interval.
odds.oddsCI: Numeric. Odds ratio confidence interval.
odds.oddsCILower: Numeric. Lower bound of the odds ratio confidence interval.
odds.oddsCIUpper: Numeric. Upper bound of the odds ratio confidence interval.
study.studyId: Character. Study ID.
study.traitReported: Character. Reported trait.
study.traitCategory: Character. Trait category.
study.pubDate: Character. Publication date.
study.pubTitle: Character. Publication title.
study.pubAuthor: Character. Publication author.
study.pubJournal: Character. Publication journal.
study.pmid: Character. PubMed ID.
study.hasSumstats: Logical. Indicates if the study has summary statistics.
study.nCases: Integer. Number of cases in the study.
study.numAssocLoci: Integer. Number of associated loci.
study.nTotal: Integer. Total number of samples in the study.
study.traitEfos: Character. Trait EFOs.
variant.id: Character. Variant ID.
variant.rsId: Character. Variant rsID.
variant.chromosome: Character. Variant chromosome.
variant.position: Integer. Variant position.
variant.refAllele: Character. Variant reference allele.
variant.altAllele: Character. Variant alternate allele.
variant.nearestCodingGeneDistance: Integer. Distance to the nearest coding gene.
variant.nearestGeneDistance: Integer. Distance to the nearest gene.
variant.mostSevereConsequence: Character. Most severe consequence.
variant.nearestGene.id: Character. Nearest gene ID.
variant.nearestCodingGene.id: Character. Nearest coding gene ID.
ensembl_id: Character. Ensembl ID.
gene_symbol: Character. Gene symbol.
Character: Gene ENSEMBL ID (e.g. ENSG00000169174) or gene symbol (e.g. PCSK9). This argument can take a list of genes too.
Numeric: Locus-to-gene (L2G) cutoff score. (Default: NA)
Character: P-value cutoff. (Default: NA)
Character: Most severe consequence to filter the variant types, including "intergenic_variant", "upstream_gene_variant", "intron_variant", "missense_variant", "5_prime_UTR_variant", "non_coding_transcript_exon_variant", "splice_region_variant". (Default: NULL)
The function also provides additional filtering parameters to narrow the results based following parameters (see below)
if (FALSE) {
result <- studiesAndLeadVariantsForGeneByL2G(genes = c("ENSG00000163946",
"ENSG00000169174", "ENSG00000143001"), l2g = 0.7)
result <- studiesAndLeadVariantsForGeneByL2G(genes = "ENSG00000169174",
l2g = 0.6, pvalue = 1e-8, vtype = c("intergenic_variant", "intron_variant"))
result <- studiesAndLeadVariantsForGeneByL2G(genes = "TMEM61")
}
Run the code above in your browser using DataLab