
import_hgnc_dataset()
imports HGNC data into R. Specify a directory path
in addition if you wish the save the data to disk.
import_hgnc_dataset(file = latest_archive_url())
A tibble of the HGNC data set consisting of 55 columns:
hgnc_id
: A unique ID provided by HGNC for each gene with an approved symbol. IDs are of the format 'HGNC:n'
, where n
is a unique number. HGNC IDs remain stable even if a name or symbol changes.
hgnc_id2
: A stripped down version of hgnc_id
where the prefix 'HGNC:'
has been removed. This column is added by the package {hgnc}
.
symbol
: The official gene symbol approved by the HGNC, typically a short form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature.
name
: The full gene name approved by the HGNC; corresponds to the approved symbol above.
locus_group
: A group name for a set of related locus types as defined by the HGNC. One of: 'protein-coding gene'
, 'non-coding RNA'
, 'pseudogene'
or 'other'
.
locus_type
: Specifies the genetic class of each gene entry, including various types of RNA and other gene-related categories, such as pseudogenes and virus integration sites.
status
: Status of the symbol report, which can be either 'Approved'
or 'Entry Withdrawn'
.
location
: Chromosomal location. Indicates the cytogenetic location of the gene or region on the chromosome, e.g., '19q13.43'
. In the absence of that information, it may be listed as 'not on reference assembly'
, 'unplaced'
, or 'reserved'
.
location_sortable
: A sortable version of the location
column, allowing easier sorting by chromosomal location.
alias_symbol
: Alternative symbols that have been used to refer to the gene. Aliases may be from literature, other databases, or represent membership of a gene group.
alias_name
: Alternative names for the gene. Aliases may be from literature, other databases, or represent membership of a gene group.
prev_symbol
: This field displays any symbols that were previously HGNC-approved nomenclature.
prev_name
: This field displays any names that were previously HGNC-approved nomenclature.
gene_group
: A gene group. Each gene has been assigned to one or more groups, according to either sequence similarity or information from publications, specialist advisors, or other databases.
gene_group_id
: Gene group identifier, an integer number. This column contains the gene group identifiers. See gene_group
for the gene group name.
date_approved_reserved
: The date the entry was first approved.
date_symbol_changed
: The date the gene symbol was last changed.
date_name_changed
: The date the gene name was last changed.
date_modified
: Date the entry was last modified.
entrez_id
: Entrez gene identifier.
ensembl_gene_id
: Ensembl gene identifier.
vega_id
: VEGA gene identifier.
ucsc_id
: UCSC gene identifier.
ena
: International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s).
refseq_accession
: The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI.
ccds_id
: Consensus CDS identifier.
uniprot_ids
: UniProt protein accession.
pubmed_id
: Pubmed and Europe Pubmed Central PMIDs.
mgd_id
: Mouse genome informatics database identifier.
rgd_id
: Rat genome database gene identifier.
lsdb
: The name of the Locus Specific Mutation Database and URL for the gene.
cosmic
: Symbol used within the Catalogue of somatic mutations in cancer for the gene.
omim_id
: Online Mendelian Inheritance in Man (OMIM) identifier.
mirbase
: miRBase identifier.
homeodb
: Homeobox Database identifier.
snornabase
: snoRNABase identifier.
bioparadigms_slc
: Symbol used to link to the SLC tables database at bioparadigms.org for the gene.
orphanet
: Orphanet identifier.
pseudogene_org
: Pseudogene.org identifier.
horde_id
: Symbol used within HORDE for the gene.
merops
: Identifier used to link to the MEROPS peptidase database.
imgt
: Symbol used within international ImMunoGeneTics information system.
iuphar
: The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database.
kznf_gene_catalog
: Lawrence Livermore National Laboratory Human KZNF Gene Catalog (LLNL) identifier.
mamit_trnadb
: Identifier to link to the Mamit-tRNA database.
cd
: Symbol used within the Human Cell Differentiation Molecule database for the gene.
lncrnadb
: lncRNA Database identifier.
enzyme_id
: ENZYME EC accession number.
intermediate_filament_db
: Identifier used to link to the Human Intermediate Filament Database.
rna_central_ids
: Identifier in the RNAcentral, The non-coding RNA sequence database.
lncipedia
: The LNCipedia identifier to which the gene belongs. This will only appear if the gene is a long non-coding RNA.
gtrnadb
: The GtRNAdb identifier to which the gene belongs. This will only appear if the gene is a tRNA.
agr
: The Alliance of Genomic Resources HGNC ID for the Human gene page within the resource.
mane_select
: MANE Select nucleotide accession with version (i.e., NCBI RefSeq or Ensembl transcript ID and version).
gencc
: Gene Curation Coalition (GenCC) Database identifier.
A file or URL of the complete HGNC data set (in TSV format).
Use list_archives()
to list previous versions of these data. Pass one
of the URLs (column url
) to file
to import that specific version. By
default the value of file
is the URL corresponding to the latest version,
i.e. the returned value of latest_archive_url()
.
if (FALSE) import_hgnc_dataset()
Run the code above in your browser using DataLab