title: "README" author: Travers Ching
qs
Quick serialization of R objects
This package provides an interface for quickly writing (serializing) and reading (de-serializing) objects to and from disk. The goal of this package is to provide a lightning-fast and complete replacement for the saveRDS
and readRDS
functions in R.
Inspired by the fst
package, qs
uses a similar block-compression approach using the zstd
library and direct "in memory" compression, which allows for lightning quick serialization. It differs in that it uses a more general approach for attributes and object references for common data types (numeric data, strings, lists, etc.), meaning any S3 object built on common data types, e.g., tibble
s, time-stamps, bit64
, etc. can be serialized. For less common data types (formulas, environments, functions, etc.), qs
relies on built in R serialization functions via the RApiSerialize
package followed by block-compression.
For character vectors, qs
also uses the alt-rep system to quickly read in string data.
Installation
devtools::install_git("traversc/qs")
(Requires R version 3.5 or higher)
Features
The table below compares the features of different serialization approaches in R.
qs | fst | saveRDS | |
---|---|---|---|
Not Slow | ✔ | ✔ | X |
Numeric Vectors | ✔ | ✔ | ✔ |
Integer Vectors | ✔ | ✔ | ✔ |
Logical Vectors | ✔ | ✔ | ✔ |
Character Vectors | ✔ | ✔ | ✔ |
Character Encoding | ✔ | (vector-wide only) | ✔ |
Complex Vectors | ✔ | X | ✔ |
Data.Frames | ✔ | ✔ | ✔ |
On disk row access | X | ✔ | X |
Attributes | ✔ | Some | ✔ |
Lists / Nested Lists | ✔ | X | ✔ |
Multi-threaded | X (Not Yet) | ✔ | X |
Summary Benchmarks
The table below lists serialization speed for several different data types.
Additional Benchmarks
Data.Frame benchmark
Benchmarks for serializing and de-serializing large data.frames (5 million rows) composed of a numeric column (rnorm
), an integer column (sample(5e6)
), and a character vector column (random alphanumeric strings of length 50). See dataframe_bench.png
for a comparison using different compression parameters.
This benchmark also includes materialization of alt-rep data, for an apples-to-apples comparison.
Serialization speed with default parameters:
Method | write time (s) | read time (s) |
---|---|---|
qs | 0.49391294 | 8.8818166 |
fst (1 thread) | 0.37411811 | 8.9309314 |
fst (4 thread) | 0.3676273 | 8.8565951 |
saveRDS | 14.377122 | 12.467517 |
Serialization speed with different parameters
The numbers in the figure reflect the compression parameter used. qs
uses the zstd
compression library, and compression parameters range from -50 to 22 (qs
uses a default value of -1). fst
defines it's own compression range through a combination of zstd
and lz4
algorithms, ranging from 0 to 100 (default: 0).
Nested List benchmark
Benchmarks for serialization of random nested lists with random attributes (approximately 50 Mb). See the nested list example in the tests folder.
Serialization speed with default parameters:
Method | write time (s) | read time (s) |
---|---|---|
qs | 0.17840716 | 0.19489372 |
saveRDS | 3.484225 | 0.58762548 |