bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

Quick demo

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes) to impute missing values of genotyped variants.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Installation

remotes::install_github("privefl/bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using function snp_readBed(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC and snp_plinkKINGQC.

This package now also reads UK Biobank BGEN files using function snp_readBGEN().

This package uses a class called bigSNP for representing SNP data. A bigSNP object is just a list with some elements:

genotypes: A FBM.code256. Rows are samples and columns are SNPs. This stores genotypes calls or dosages (rounded to 2 decimal places).
fam: A data.frame containing some information on the SNPs.
map: A data.frame giving some information on the individuals.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Possible upcoming features

Multiple imputation for GWAS (https://doi.org/10.1371/journal.pgen.1006091).
More interactive (visual) QC.

You can request some feature by opening an issue.

Bug report

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue on {bigstatsr}'s repo or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Privé, Florian, et al. "Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr." Bioinformatics 34.16 (2018): 2781-2787.
Privé, Florian, et al. "Efficient implementation of penalized regression for genetic risk prediction." Genetics 212.1 (2019): 65-74.
Privé, Florian, et al. "Making the most of Clumping and Thresholding for polygenic scores." Am J Hum Genet (2019).
Privé, Florian, et al. "Efficient toolkit implementing best practices for principal component analysis of population genetic data." BioRxiv (2019): 841452.

bigsnpr

Installation

Input formats

Possible upcoming features

Bug report

References

Copy Link

Version

Install

Monthly Downloads

Version

License

Maintainer

Last Published

Functions in bigsnpr (1.2.1)