title: "README" author: Travers Ching
qs
Quick serialization of R objects
This package provides an interface for quickly writing (serializing) and reading (de-serializing) objects to and from disk. The goal of this package is to provide a lightning-fast and complete replacement for the saveRDS and readRDS functions in R.
Inspired by the fst package, qs uses a similar block-compression approach using the zstd library and direct "in memory" compression, which allows for lightning quick serialization. It differs in that it uses a more general approach for attributes and object references for common data types (numeric data, strings, lists, etc.), meaning any S3 object built on common data types, e.g., tibbles, time-stamps, bit64, etc. can be serialized. For less common data types (formulas, environments, functions, etc.), qs relies on built in R serialization functions via the RApiSerialize package followed by block-compression.
For character vectors, qs also uses the alt-rep system to quickly read in string data.
Installation
devtools::install_git("traversc/qs")
(Requires R version 3.5 or higher)
Features
The table below compares the features of different serialization approaches in R.
| qs | fst | saveRDS | |
|---|---|---|---|
| Not Slow | ✔ | ✔ | X |
| Numeric Vectors | ✔ | ✔ | ✔ |
| Integer Vectors | ✔ | ✔ | ✔ |
| Logical Vectors | ✔ | ✔ | ✔ |
| Character Vectors | ✔ | ✔ | ✔ |
| Character Encoding | ✔ | (vector-wide only) | ✔ |
| Complex Vectors | ✔ | X | ✔ |
| Data.Frames | ✔ | ✔ | ✔ |
| On disk row access | X | ✔ | X |
| Attributes | ✔ | Some | ✔ |
| Lists / Nested Lists | ✔ | X | ✔ |
| Multi-threaded | X (Not Yet) | ✔ | X |
Summary Benchmarks
The table below lists serialization speed for several different data types.
Additional Benchmarks
Data.Frame benchmark
Benchmarks for serializing and de-serializing large data.frames (5 million rows) composed of a numeric column (rnorm), an integer column (sample(5e6)), and a character vector column (random alphanumeric strings of length 50). See dataframe_bench.png for a comparison using different compression parameters.
This benchmark also includes materialization of alt-rep data, for an apples-to-apples comparison.
Serialization speed with default parameters:
| Method | write time (s) | read time (s) |
|---|---|---|
| qs | 0.49391294 | 8.8818166 |
| fst (1 thread) | 0.37411811 | 8.9309314 |
| fst (4 thread) | 0.3676273 | 8.8565951 |
| saveRDS | 14.377122 | 12.467517 |
Serialization speed with different parameters
The numbers in the figure reflect the compression parameter used. qs uses the zstd compression library, and compression parameters range from -50 to 22 (qs uses a default value of -1). fst defines it's own compression range through a combination of zstd and lz4 algorithms, ranging from 0 to 100 (default: 0).
Nested List benchmark
Benchmarks for serialization of random nested lists with random attributes (approximately 50 Mb). See the nested list example in the tests folder.
Serialization speed with default parameters:
| Method | write time (s) | read time (s) |
|---|---|---|
| qs | 0.17840716 | 0.19489372 |
| saveRDS | 3.484225 | 0.58762548 |