ukbtools v0.11.1


Monthly downloads



Manipulate and Explore UK Biobank Data

A set of tools to create a UK Biobank <> dataset from a UKB fileset (.tab, .r, .html), visualize primary demographic data for a sample subset, query ICD diagnoses, retrieve genetic metadata, read and write standard file formats for genetic analyses.



CRAN_Status_Badge Travis-CI Build Status

After downloading and decrypting your UK Biobank (UKB) data with the supplied [UKB programs] (, you have multiple files that need to be brought together to give you a dataset to explore. The data file has column names that are edited field-codes from the UKB data showcase. ukbtools makes it easy to collapse the multiple UKB files into a single dataset for analysis, in the process giving meaningful names to the variables. The package also includes functionality to retrieve ICD diagnoses, explore a sample subset in the context of the UKB sample, and collect genetic metadata.


```{r, eval = FALSE}

Install from CRAN


Install latest development version

devtools::install_github("kenhanscombe/ukbtools", dependencies = TRUE)

## Prerequisite: Make a UKB fileset

Download<sup>§</sup> then decrypt your data and create a "UKB fileset" (.tab, .r, .html):

```{bash, eval = FALSE}
ukb_unpack ukbxxxx.enc key
ukb_conv ukbxxxx.enc_ukb r
ukb_conv ukbxxxx.enc_ukb docs

ukb_unpack decrypts your downloaded ukbxxxx.enc file, outputting a ukbxxxx.enc_ukb file. ukb_conv with the r flag converts the decrypted data to a tab-delimited file and an R script ukbxxxx.r that reads the tab file. The docs flag creates an html file containing a field-code-to-description table (among others).

§ Full details of the data download and decrypt process are given in the Using UK Biobank Data documentation.

Make a UKB dataset

The function ukb_df() takes two arguments, the stem of your fileset and the path, and returns a dataframe with usable column names. This will take a few minutes. The rate-limiting step is reading and parsing the code in the UKB-generated .r file - not ukb_df per se.

```{r, eval = FALSE}


my_ukb_data <- ukb_df("ukbxxxx")

You can also specify the path to your fileset if it is not in the current directory. For example, if your fileset is in a subdirectory of the working directory called data

```{r, eval = FALSE}

my_ukb_data <- ukb_df("ukbxxxx", path = "/full/path/to/my/data")

Note: You can move the three files in your fileset after creating them with ukb_conv, but they should be kept together. ukb_df() automatically updates the read call in the R source file to point to the correct directory (the current directory by default, or a directory specified by path).

Other tools

All tools are described on the ukbtools webpage and in the package vignette "Explore UK Biobank Data"

```{r, eval = FALSE}

vignette("explore-ukb-data", package = "ukbtools")

For a list of all functions

```{r, eval = FALSE}

help(package = "ukbtools")

Functions in ukbtools

Name Description
ukb_centre Inserts UKB centre names into data
icd9chapters International Classification of Diseases Revision 9 (ICD-9) chapters
ukb_gen_meta Genetic metadata
ukb_gen_pcs Genetic principal components
ukb_gen_sqc_names Sample QC column names
ukb_gen_write_bgenie Writes a BGENIE format phenotype or covariate file.
icd9codes International Classification of Diseases Revision 9 (ICD-9) codes
ukb_gen_write_plink Writes a PLINK format phenotype or covariate file
ukb_gen_write_plink_excl Writes a PLINK format file for combined exclusions
ukb_context Demographics of a UKB sample subset
ukb_defunct Defunct genetic metadata functions
ukb_df Reads a UK Biobank phenotype fileset and returns a single dataset.
ukb_gen_excl_to_na Inserts NA into phenotype for genetic metadata exclusions
ukb_df_full_join Recursively join a list of UKB datasets
ukb_gen_rel Creates a table of related individuals
ukb_gen_excl Sample exclusions
ukb_icd_code_meaning Retrieves description for a ICD code.
ukb_icd_diagnosis Retrieves diagnoses for an individual.
ukb_gen_read_fam Reads a PLINK format fam file
ukb_gen_read_sample Reads an Oxford format sample file
ukb_icd_freq_by Frequency of an ICD diagnosis by a target variable
ukb_icd_keyword Retrieves diagnoses containing a description.
ukb_gen_het Heterozygosity outliers
ukb_gen_rel_count Relatedness count
ukb_gen_related_with_data Subset of the UKB relatedness dataframe with data
ukb_gen_samples_to_remove Related samples (with data on the variable of interest) to remove
ukbtools ukbtools: Manipulate and Explore UK Biobank Data
ukb_icd_prevalence Returns the prevalence for an ICD diagnosis
ukbcentre UKB assessment centre
ukb_df_duplicated_name Checks for duplicated names within a UKB dataset
ukb_df_field Makes a UKB data-field to variable name table for reference or lookup.
icd10chapters International Classification of Diseases Revision 10 (ICD-10) chapters
icd10codes International Classification of Diseases Revision 10 (ICD-10) codes
No Results!

Vignettes of ukbtools

No Results!

Last month downloads


License GPL-2
Encoding UTF-8
LazyData true
RoxygenNote 6.1.1
VignetteBuilder knitr
NeedsCompilation no
Packaged 2019-03-14 16:07:59 UTC; ken
Repository CRAN
Date/Publication 2019-03-14 16:30:03 UTC

Include our badge in your README