Learn R Programming

⚠️There's a newer version (1.12.18) of this package.Take me there.

bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

Quick demo

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes only) to impute missing values of genotyped variants.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Installation

In R, run

# install.packages("remotes")
remotes::install_github("privefl/bigsnpr")

or for the CRAN version

install.packages("bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using functions snp_readBed() and snp_readBed2(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC() and snp_plinkKINGQC().

This package can also read UK Biobank BGEN files using function snp_readBGEN(). This function takes around 40 minutes to read 1M variants for 400K individuals using 15 cores.

This package uses a class called bigSNP for representing SNP data. A bigSNP object is a list with some elements:

  • genotypes: A FBM.code256. Rows are samples and columns are SNPs. This stores genotype calls or dosages (rounded to 2 decimal places).
  • fam: A data.frame with some information on the SNPs.
  • map: A data.frame with some information on the individuals.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Polygenic scores

Polygenic scores are one of the main focus of this package. There are 3 main methods currently available:

  • Penalized regressions with individual-level data (see paper and tutorial)

  • Clumping and Thresholding (C+T) and Stacked C+T (SCT) with summary statistics and individual level data (see paper and tutorial).

  • LDpred2 with summary statistics (see preprint and tutorial)

Possible upcoming features

You can request some feature by opening an issue.

Bug report

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue on {bigstatsr}'s repo or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Copy Link

Version

Install

install.packages('bigsnpr')

Monthly Downloads

1,551

Version

1.5.2

License

GPL-3

Maintainer

Florian Privé

Last Published

November 1st, 2020

Functions in bigsnpr (1.5.2)

LD.wiki34

Long-range LD regions
bed_counts

Counts
coef_to_liab

Liability scale
bed_cprodVec

Cross-product with a vector
download_plink

Download PLINK
download_beagle

Download Beagle 4.1
download_1000G

Download 1000G
SCT

Stacked C+T (SCT)
bed_prodVec

Product with a vector
bed-class

Class bed
snp_attach

Attach a "bigSNP" from backing files
snp_PRS

PRS
snp_MAX3

MAX3 statistic
snp_attachExtdata

Attach a "bigSNP" for examples and tests
snp_modifyBuild

Modify genome build
bed_clumping

LD clumping
snp_readBGEN

Read BGEN files into a "bigSNP"
snp_pcadapt

Outlier detection
snp_qq

Q-Q plot
CODE_012

CODE_012: code genotype calls (3) and missing values.
bed_projectPCA

Projecting PCA
bigSNP-class

Class bigSNP
bigsnpr-package

bigsnpr: Analysis of Massive SNP Arrays
bed_randomSVD

Randomized partial SVD
snp_MAF

MAF
bed_projectSelfPCA

Projecting PCA
seq_log

Sequence, evenly spaced on a logarithmic scale
snp_assocBGEN

Compute quick association statistics from BGEN files
snp_asGeneticPos

Interpolate to genetic positions
snp_fastImpute

Fast imputation
snp_autoSVD

Truncated SVD while limiting LD
snp_beagleImpute

Imputation
snp_scaleBinom

Binomial(n, p) scaling
snp_save

Save modifications
snp_match

Match alleles
snp_manhattan

Manhattan plot
snp_cor

Correlation matrix
snp_fake

Fake a "bigSNP"
snp_plinkQC

Quality Control
snp_plinkRmSamples

Remove samples
snp_fastImputeSimple

Fast imputation
snp_getSampleInfos

Get sample information
snp_ldsc

LD score regression
snp_writeBed

Write PLINK files from a "bigSNP"
snp_subset

Subset a bigSNP
snp_readBGI

Read variant info from one BGI file
snp_readBed

Read PLINK files into a "bigSNP"
bed_MAF

Allele frequencies
bed-methods

Methods for the bed class
bed_scaleBinom

Binomial(2, p) scaling
reexports

Objects exported from other packages
snp_gc

Genomic Control
bed_tcrossprodSelf

Tcrossprod
same_ref

Determine reference divergence
snp_fst

Fixation index (Fst)
snp_plinkIBDQC

Identity-by-descent
snp_plinkKINGQC

Relationship-based pruning
snp_split

Split-parApply-Combine
sub_bed

Replace extension '.bed'
snp_simuPheno

Simulate phenotypes
snp_ldpred2_inf

LDpred2