Learn R Programming

⚠️There's a newer version (1.12.21) of this package.Take me there.

bigsnpr

{bigsnpr} is an R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package {bigstatsr} for the purpose of analyzing genotype data.

Quick demo

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values. You can use snp_fastImpute() (taking a few hours for a chip of 15K x 300K) and snp_fastImputeSimple() (taking a few minutes) to impute missing values of genotyped variants.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Installation

remotes::install_github("privefl/bigsnpr")

Input formats

This package reads bed/bim/fam files (PLINK preferred format) using function snp_readBed(). Before reading into this package's special format, quality control and conversion can be done using PLINK, which can be called directly from R using snp_plinkQC and snp_plinkKINGQC.

This package now also reads UK Biobank BGEN files using function snp_readBGEN().

This package uses a class called bigSNP for representing SNP data. A bigSNP object is just a list with some elements:

  • genotypes: A FBM.code256. Rows are samples and columns are SNPs. This stores genotypes calls or dosages (rounded to 2 decimal places).
  • fam: A data.frame containing some information on the SNPs.
  • map: A data.frame giving some information on the individuals.

New! Package {bigsnpr} now provides functions that directly work on bed files with a few missing values. See new paper "Efficient toolkit implementing..".

Possible upcoming features

You can request some feature by opening an issue.

Bug report

How to make a great R reproducible example?

Please open an issue if you find a bug.

If you want help using {bigstatsr}, please open an issue on {bigstatsr}'s repo or post on Stack Overflow with the tag bigstatsr.

I will always redirect you to GitHub issues if you email me, so that others can benefit from our discussion.

References

Copy Link

Version

Install

install.packages('bigsnpr')

Monthly Downloads

3,230

Version

1.2.1

License

GPL-3

Maintainer

Florian Privé

Last Published

February 3rd, 2020

Functions in bigsnpr (1.2.1)

SCT

Stacked C+T (SCT)
LD.wiki34

Long-range LD regions
download_plink

Download PLINK
bed_counts

Counts
bigSNP-class

Class bigSNP
bed_MAF

Allele frequencies
bed_tcrossprodSelf

Tcrossprod
download_beagle

Download Beagle 4.1
bed_cprodVec

Cross-product with a vector
bed-class

Class bed
bed_prodVec

Product with a vector
snp_autoSVD

Truncated SVD while limiting LD
bed_projectSelfPCA

Projecting PCA
bed_projectPCA

Projecting PCA
snp_beagleImpute

Imputation
same_ref

Determine reference divergence
snp_assocBGEN

Compute quick association statistics from BGEN files
seq_log

Sequence, evenly spaced on a logarithmic scale
snp_PRS

PRS
snp_cor

Correlation
snp_attach

Attach a "bigSNP" from backing files
snp_attachExtdata

Attach a "bigSNP" for examples and tests
bed-methods

Methods for the bed class
snp_MAX3

MAX3 statistic
bigsnpr-package

bigsnpr: Analysis of Massive SNP Arrays
download_1000G

Download 1000G
snp_gc

Genomic Control
snp_MAF

MAF
snp_getSampleInfos

Get sample information
bed_randomSVD

Randomized partial SVD
snp_modifyBuild

Modify genome build
snp_pcadapt

Outlier detection
snp_plinkQC

Quality Control
bed_scaleBinom

Binomial(2, p) scaling
bed_clumping

LD clumping
snp_plinkRmSamples

Remove samples
snp_fastImpute

Fast imputation
snp_fastImputeSimple

Fast imputation
CODE_012

CODE_012: code genotype calls (3) and missing values.
snp_match

Match alleles
snp_manhattan

Manhattan plot
sub_bed

Replace extension '.bed'
snp_fake

Fake a "bigSNP"
subset.bigSNP

Subset
snp_plinkKINGQC

Relationship-based pruning
snp_plinkIBDQC

Identity-by-descent
snp_qq

Q-Q plot
snp_readBGEN

Read BGEN files into a "bigSNP"
snp_readBGI

Read variant info from one BGI file
snp_save

Save modifications
snp_scaleBinom

Binomial(n, p) scaling
snp_readBed

Read PLINK files into a "bigSNP"
snp_split

Split-parApply-Combine
snp_writeBed

Write PLINK files from a "bigSNP"