qs v0.12

0

Monthly downloads

0th

Percentile

Quick Serialization of R Objects

Provides functions for quickly writing and reading any R object to and from disk. This package makes use of the 'zstd' library for compression and decompression. 'zstd' is created by Yann Collet and owned by Facebook, Inc.

Readme


title: "README"

author: Travers Ching

qs Build Status

Quick serialization of R objects

This package provides an interface for quickly writing (serializing) and reading (de-serializing) objects to and from disk. The goal of this package is to provide a lightning-fast and complete replacement for the saveRDS and readRDS functions in R.

Inspired by the fst package, qs uses a similar block-compression approach using the zstd library and direct "in memory" compression, which allows for lightning quick serialization. It differs in that it uses a more general approach for attributes and object references for common data types (numeric data, strings, lists, etc.), meaning any S3 object built on common data types, e.g., tibbles, time-stamps, bit64, etc. can be serialized. For less common data types (formulas, environments, functions, etc.), qs relies on built in R serialization functions via the RApiSerialize package followed by block-compression.

For character vectors, qs also uses the alt-rep system to quickly read in string data.

Installation

devtools::install_git("traversc/qs")

(Requires R version 3.5 or higher)

Features

The table below compares the features of different serialization approaches in R.

qs fst saveRDS
Not Slow X
Numeric Vectors
Integer Vectors
Logical Vectors
Character Vectors
Character Encoding (vector-wide only)
Complex Vectors X
Data.Frames
On disk row access X X
Attributes Some
Lists / Nested Lists X
Multi-threaded X (Not Yet) X

Summary Benchmarks

The table below lists serialization speed for several different data types.

qs saveRDS fst
1 thread
fst
4 threads
Write Read Write Read Write Read Write Read
Integer Vector
sample(1e8)
1015.2 MB/s
889.8 MB/s 27.1 MB/s 135.5 MB/s 686.6 MB/s 442.4 MB/s 699.1 MB/s 567.9 MB/s
Numeric Vector
runif(1e8)
861.2 MB/s 954.0 MB/s 24.3 MB/s 131.9 MB/s 744.0 MB/s 638.7 MB/s 754.4 MB/s 848.0 MB/s
Character Vector
qs::randomStrings(1e7)
1312.9 MB/s 715.8 MB/s* 49.1 MB/s 43.9 MB/s 1440.9 MB/s 59.5 MB/s 1536.3 MB/s 59.3 MB/s
List
map(1:1e5,sample(100))
197.2 MB/s
311.5 MB/s 7.7 MB/s 123.5 MB/s N/A N/A N/A N/A
Environment
map(1:1e5,sample(100))
names(x)<-1:1e5
as.environment(x)
56.0 MB/s 117.5 MB/s 7.7 MB/s 89.6 MB/s N/A N/A N/A N/A

Additional Benchmarks

Data.Frame benchmark

Benchmarks for serializing and de-serializing large data.frames (5 million rows) composed of a numeric column (rnorm), an integer column (sample(5e6)), and a character vector column (random alphanumeric strings of length 50). See dataframe_bench.png for a comparison using different compression parameters.

This benchmark also includes materialization of alt-rep data, for an apples-to-apples comparison.

Serialization speed with default parameters:

Method write time (s) read time (s)
qs 0.49391294 8.8818166
fst (1 thread) 0.37411811 8.9309314
fst (4 thread) 0.3676273 8.8565951
saveRDS 14.377122 12.467517

Serialization speed with different parameters

The numbers in the figure reflect the compression parameter used. qs uses the zstd compression library, and compression parameters range from -50 to 22 (qs uses a default value of -1). fst defines it's own compression range through a combination of zstd and lz4 algorithms, ranging from 0 to 100 (default: 0).

Nested List benchmark

Benchmarks for serialization of random nested lists with random attributes (approximately 50 Mb). See the nested list example in the tests folder.

Serialization speed with default parameters:

Method write time (s) read time (s)
qs 0.17840716 0.19489372
saveRDS 3.484225 0.58762548

Functions in qs

Name Description
qread qread
qdump qdump
convertToAlt Convert character vector to alt-rep
zstd_compress_raw Zstd compression
zstd_decompress_raw Zstd decompression
qsave qsave
qs_use_alt_rep Use alt-rep
is_big_endian System Endianness
randomStrings Generate random strings
zstd_compressBound Zstd CompressBound
No Results!

Vignettes of qs

Name
dataframe_bench.png
nested_list_bench.png
vignette.html
vignette.rmd
No Results!

Last month downloads

Details

Type Package
Date 2019-2-1
License AGPL-3 | file LICENSE
LinkingTo Rcpp, RApiSerialize
RoxygenNote 6.0.1
VignetteBuilder knitr
Copyright This package includes code from the 'zstd' library owned by Facebook, Inc. and created by Yann Collet.
URL https://github.com/traversc/qs
BugReports https://github.com/traversc/qs/issues
NeedsCompilation yes
Packaged 2019-02-02 23:32:06 UTC; tching
Repository CRAN
Date/Publication 2019-02-08 16:10:02 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/qs)](http://www.rdocumentation.org/packages/qs)