Learn R Programming

⚠️There's a newer version (0.1.7) of this package.Take me there.

qs2

qs2: a framework for efficient serialization

qs2 is the successor to the qs package. The goal is to have reliable and fast performance for saving and loading objects in R.

The qs2 format directly uses R serialization (via the R_Serialize/R_Unserialize C API) while improving underlying compression and disk IO patterns. If you are familiar with the qs package, the benefits and usage are the same.

qs_save(data, "myfile.qs2")
data <- qs_read("myfile.qs2")

Use the file extension qs2 to distinguish it from the original qs package. It is not compatible with the original qs format.

Installation

install.packages("qs2")

On x64 Mac or Linux, you can enable multi-threading by compiling from source. It is enabled by default on Windows.

remotes::install_cran("qs2", type = "source", configure.args = "--with-TBB --with-simd=AVX2")

On non-x64 systems (e.g. Mac ARM) remove the AVX2 flag.

remotes::install_cran("qs2", type = "source", configure.args = "--with-TBB")

Multi-threading in qs2 uses the Intel Thread Building Blocks framework via the RcppParallel package.

Converting qs2 to RDS

Because the qs2 format directly uses R serialization, you can convert it to RDS and vice versa.

file_qs2 <- tempfile(fileext = ".qs2")
file_rds <- tempfile(fileext = ".RDS")
x <- runif(1e6)

# save `x` with qs_save
qs_save(x, file_qs2)

# convert the file to RDS
qs_to_rds(input_file = file_qs2, output_file = file_rds)

# read `x` back in with `readRDS`
xrds <- readRDS(file_rds)
stopifnot(identical(x, xrds))

Validating file integrity

The qs2 format saves an internal checksum. This can be used to test for file corruption before deserialization via the validate_checksum parameter, but has a minor performance penalty.

qs_save(data, "myfile.qs2")
data <- qs_read("myfile.qs2", validate_checksum = TRUE)

The qdata format

The package also introduces the qdata format which has its own serialization layout and works with only data types (vectors, lists, data frames, matrices).

It will replace internal types (functions, promises, external pointers, environments, objects) with NULL. The qdata format differs from the qs2 format in that it is NOT a general.

The eventual goal of qdata is to also have interoperability with other languages, particularly Python.

qd_save(data, "myfile.qs2")
data <- qd_read("myfile.qs2")

Benchmarks

A summary across 4 datasets is presented below.

Single-threaded

AlgorithmCompressionSave Time (s)Read Time (s)
qs27.9613.450.4
qdata8.4510.534.8
base::serialize1.18.8751.4
saveRDS8.6810763.7
fst2.595.0946.3
parquet8.2920.338.4
qs (legacy)7.979.1348.1

Multi-threaded (8 threads)

AlgorithmCompressionSave Time (s)Read Time (s)
qs27.963.7948.1
qdata8.451.9833.1
fst2.595.0546.6
parquet8.2920.237.0
qs (legacy)7.973.2152.0
  • qs2, qdata and qs with compress_level = 3
  • parquet via the arrow package using zstd compression_level = 3
  • base::serialize with ascii = FALSE and xdr = FALSE

Datasets used

  • 1000 genomes non-coding VCF 1000 genomes non-coding variants (2743 MB)
  • B-cell data B-cell mouse data, Greiff 2017 (1057 MB)
  • IP location IPV4 range data with location information (198 MB)
  • Netflix movie ratings Netflix ML prediction dataset (571 MB)

These datasets are openly licensed and represent a combination of numeric and text data across multiple domains. See inst/analysis/datasets.R on Github.

Usage in C/C++

Serialization functions can be accessed in compiled code. Below is an example using Rcpp.

// [[Rcpp::depends(qs2)]]
#include <Rcpp.h>
#include "qs2_external.h"
using namespace Rcpp;

// [[Rcpp::export]]
SEXP test_qs_serialize(SEXP x) {
  size_t len = 0;
  unsigned char * buffer = c_qs_serialize(x, &len, 10, true, 4); // object, buffer length, compress_level, shuffle, nthreads
  SEXP y = c_qs_deserialize(buffer, len, false, 4);              // buffer, buffer length, validate_checksum, nthreads
  c_qs_free(buffer);                                             // must manually free buffer
  return y;
}

// [[Rcpp::export]]
SEXP test_qd_serialize(SEXP x) {
  size_t len = 0;
  unsigned char * buffer = c_qd_serialize(x, &len, 10, true, 4); // object, buffer length, compress_level, shuffle, nthreads
  SEXP y = c_qd_deserialize(buffer, len, false, false, 4);       // buffer, buffer length, use_alt_rep, validate_checksum, nthreads
  c_qd_free(buffer);                                             // must manually free buffer
  return y;
}


/*** R
x <- runif(1e7)
stopifnot(test_qs_serialize(x) == x)
stopifnot(test_qd_serialize(x) == x)
*/

Copy Link

Version

Install

install.packages('qs2')

Monthly Downloads

16,576

Version

0.1.4

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Travers Ching

Last Published

December 12th, 2024

Functions in qs2 (0.1.4)

starnames

Official list of IAU Star Names
xxhash_raw

XXH3_64 hash
qs_deserialize

qs_deserialize
zstd_compress_raw

Zstd compression
zstd_compress_bound

Zstd compress bound
qs_savem

qs_savem
rds_to_qs

RDS to qs2 format
qx_dump

qx_dump
qs_save

qs_save
base85_encode

Z85 Encoding
base91_encode

basE91 Encoding
base85_decode

Z85 Decoding
encode_source

Encode and compress a file or string
base91_decode

basE91 Decoding
catquo

catquo
blosc_shuffle_raw

Shuffle a raw vector
blosc_unshuffle_raw

Un-shuffle a raw vector
decode_source

Decode a compressed string
qd_deserialize

qd_deserialize
qd_read

qd_read
qs_serialize

qs_serialize
qs_to_rds

qs2 to RDS format
qd_save

qd_save
qs_readm

qs_readm
qs_read

qs_read
qd_serialize

qd_serialize
zstd_decompress_raw

Zstd decompression