Learn R Programming

⚠️There's a newer version (1.6.2) of this package.Take me there.

bigstatsr

R package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format big.matrix provided by R package {bigmemory}, which is no longer used by this package (see the corresponding vignette). As inputs, package {bigstatsr} uses Filebacked Big Matrices (FBM).

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values.

Installation

# For the current development version
devtools::install_github("privefl/bigstatsr")

Small example

library(bigstatsr)

# Create the data on disk
X <- FBM(5e3, 10e3, backingfile = "test")$save()
# If you open a new session you can do
X <- big_attach("test.rds")

# Fill it by chunks with random values
U <- matrix(0, nrow(X), 5); U[] <- rnorm(length(U))
V <- matrix(0, ncol(X), 5); V[] <- rnorm(length(V))
NCORES <- nb_cores()
# X = U V^T + E
big_apply(X, a.FUN = function(X, ind, U, V) {
  X[, ind] <- tcrossprod(U, V[ind, ]) + rnorm(nrow(X) * length(ind))
  NULL  ## you don't want to return anything here
}, a.combine = 'c', ncores = NCORES, U = U, V = V)
# Check some values
X[1:5, 1:5]

# Compute first 10 PCs
obj.svd <- big_randomSVD(X, fun.scaling = big_scale(), 
                         k = 10, ncores = NCORES)
plot(obj.svd)

# Cleanup
unlink(paste0("test", c(".bk", ".rds")))

Learn more with this introduction to package {bigstatsr}.

Bug report / Help

Please open an issue if you find a bug. If you want help using {bigstatsr}, please post on Stack Overflow with the tag bigstatsr. How to make a great R reproducible example?

Use cases

Parallelization

Package {bigstatsr} uses package {foreach} for its parallelization tasks. Learn more on parallelism with {foreach} with this tutorial.

Large datasets

Copy Link

Version

Install

install.packages('bigstatsr')

Monthly Downloads

5,803

Version

0.9.1

License

GPL-3

Maintainer

Florian Privé

Last Published

March 3rd, 2019

Functions in bigstatsr (0.9.1)

Extract

Create an Implementation of [ For Custom Matrix-Like Types
asPlotlyText

Plotly text
big_SVD

Partial SVD
big_prodVec

Product with a vector
big_apply

Split-Apply-Combine
big_randomSVD

Randomized partial SVD
big_counts

Counts
big_copy

Copy a Filebacked Big Matrix
AUC

AUC
big_write

Write a file
COPY_biglasso_main

Sparse regression path
big_parallelize

Split-parApply-Combine
big_colstats

Standard univariate statistics
big_tcrossprodSelf

Tcrossprod
big_attach

Attach a Filebacked Big Matrix
big_cprodMat

Cross-product with a matrix
nb_cores

Recommended number of cores to use
bigstatsr-package

bigstatsr: Statistical Tools for Filebacked Big Matrices
big_cor

Correlation
big_transpose

Transposition
summary.big_sp_list

Summary method
pasteLoc

Get coordinates
plot.big_SVD

Plot method
plot.big_sp_list

Plot method
plot.mhtest

Plot method
plus

Add
theme_bigstatsr

Theme ggplot2
big_prodMat

Product with a matrix
big_spLinReg

Sparse linear regression
big_spLogReg

Sparse logistic regression
big_univLinReg

Column-wise linear regression
block_size

Determine a correct value for the block.size parameter
get_beta

Combine sets of coefficients
predict.big_SVD

Scores of PCA
big_univLogReg

Column-wise logistic regression
rows_along

Sequence generation
sub_bk

Replace extension 'bk'
predict.big_sp

Predict method
without_downcast_warning

Temporarily disable downcast warning
FBM.code256-class

Class FBM.code256
FBM.code256-methods

Methods for the FBM.code256 class
big_cprodVec

Cross-product with a vector
big_crossprodSelf

Crossprod
big_read

Read a file
big_scale

Some scaling functions
predict.big_sp_list

Predict method
predict.mhtest

Predict method
COPY_biglasso_part

Train one model
Replace

Create an Implementation of [<- For Custom Matrix-Like Types
FBM-class

Class FBM
FBM-methods

Methods for the FBM class