Learn R Programming

Geneset

Overview

Omic-age brings huge amoung of gene data, which bring a problem of how to uncover their potential biological effects. One effective way is gene enrichment analysis.

Inside gene enrichment analysis, the central and fundamental part is the access of gene sets, no matter of traditional Over-representation analysis (ORA) method or advanced Functional class scoring (FCS) method (e.g. Gene Set Enrichment Analysis (GSEA) ).

Currently, many available enrichment analysis tools provide built-in data sets for few model species or ask users to download online. This causes a problem that user needs to download different gene sets from various public database for non-model species. For example, enrichGO() and gseGO() of clusterProfiler utilized organism-level annotation package for about 20 species. If research target is not listed in these organisms, user needs to build one via AnnotationHub or download from biomaRt or Blast2GO, which is time-comsuming and hard task for biologists without programming skills.

Here, we develop an R package name "geneset", aimming at accessing for updated gene sets with less time.

It includes GO (BP, CC and MF), KEGG (pathway, module, enzyme, network, drug and disease), WikiPathway, MsigDb, EnrichrDb, Reactome, MeSH, DisGeNET, Disease Ontology (DO), Network of Cancer Gene (NCG) (version 6 and v7) and COVID-19. Besides, it supports both model and non-model species.

Supported organisms

For more details, please refer to this site.

  • GO supports 143 species
  • KEGG supports 8213 species
  • MeSH supports 71 species
  • MsigDb supports 20 species
  • WikiPahtwaysupports 16 species
  • Reactome supports 11 species
  • EnrichrDB supports 5 species
  • Disease-related only support human (DO, NCG, DisGeNET and COVID-19)

About the data

All gene sets are stored on our website and could be easily accessed with simple functions.

We will follow a monthly-update frequency to make better user experience.

Copy Link

Version

Install

install.packages('geneset')

Monthly Downloads

411

Version

0.2.7

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Yunze Liu

Last Published

November 20th, 2022

Functions in geneset (0.2.7)

getHgDisease

Get HgDisease geneset and geneset_name Human disease gene sets from Disease Ontology (DO),DisGeNET, Network of Cancer Gene (NCG) version 6 and v7, and covid19-specific. Geneset is a data.frame of 2 columns with term id and gene id; Geneset_name is a data.frame of 2 columns with term id and term description
getMesh

Get MeSH geneset and geneset_name MeSH is the annotation used for MEDLINE/PubMed articles and is manually curated by NLM (U.S. National Library of Medicine). Geneset is a data.frame of 2 columns with term id and gene id; Geneset_name is a data.frame of 2 columns with term id and term description
Datasets

Datasets go_org contains GO species information
getMsigdb

Get MsigDb geneset and geneset_name Geneset is a data.frame of 2 columns with term id and gene id
getGO

Get GO geneset and geneset_name Geneset is a data.frame of 2 columns with term id and gene id; Geneset_name is a data.frame of 2 columns with term id and term description
getReactome

Get Reactome geneset and geneset_name Geneset is a data.frame of 2 columns with term id and gene id; Geneset_name is a data.frame of 2 columns with term id and term description
getWiki

Get WikiPathway geneset and geneset_name Geneset is a data.frame of 2 columns with term id and gene id; Geneset_name is a data.frame of 2 columns with term id and term description
getKEGG

Get KEGG geneset and geneset_name Geneset is a data.frame of 2 columns with term id and gene id; Geneset_name is a data.frame of 2 columns with term id and term description
getEnrichrdb

Get EnrichrDB geneset and geneset_name Geneset is a data.frame of 2 columns with term id and gene id