'Rcpp' Bindings for the 'simdjson' Header-Only Library for 'JSON' Parsing
The 'JSON' format is ubiquitous for data interchange, and the
'simdjson' library written by Daniel Lemire (and many contributors) provides
a high-performance parser for these files which by relying on parallel 'SIMD'
instruction manages to parse these files as faster than disk speed. See the
<arXiv:1902.08318> paper for more details about 'simdjson'. This package is
at present only a very thin and incomplete wrapper and does not aim to replace
the existing and excellent 'JSON' packages for R. But it does already validate
orders of magnitude faster.
RcppSimdJSON: Rcpp Bindings for the simdjson Header Library
simdjson by Daniel Lemire (with contributions by Geoff Langdale, John Keiser and many others) is an engineering marvel. Through very clever use of SIMD instructions, it manages to parse JSON files faster than disc access. Wut? Yes you read that right: parallel processing with so little overhead that the net throughput is limited only by disk speed.
Moreover, it is implemented in neat modern C++ and can be accessed as a header-only library. (Well, one library in two files, really.) Which makes R packaging easy and convenient and compelling. So here we are.
jsonfile <- system.file("jsonexamples", "twitter.json", package="RcppSimdJson") validateJSON(jsonfile)
A simple benchmark against four other R-accessible JSON parsers:
R> print(res, order="median") Unit: microseconds expr min lq mean median uq max neval cld simdjson 279.246 332.577 390.815 362.11 427.638 648.652 100 a jsonify 2820.079 2930.945 3064.773 3027.28 3153.427 3986.948 100 b jsonlite 8899.379 9085.685 9273.974 9226.56 9349.513 10820.562 100 c RJSONIO 9685.246 9899.634 10185.272 10105.96 10296.579 11766.177 100 d ndjson 99460.979 100381.388 101758.682 100971.75 102613.041 111553.986 100 e R> print(res, order="median", unit="relative") Unit: relative expr min lq mean median uq max neval cld simdjson 1.0000 1.00000 1.00000 1.00000 1.00000 1.00000 100 a jsonify 10.0989 8.81284 7.84201 8.36011 7.37406 6.14651 100 b jsonlite 31.8693 27.31908 23.72986 25.48003 21.86315 16.68161 100 c RJSONIO 34.6836 29.76649 26.06165 27.90857 24.07779 18.13943 100 d ndjson 356.1769 301.82947 260.37585 278.84314 239.95305 171.97817 100 e R>
Or in chart form:
Minimally viable. Right now it builds, wraps the validation test, and checks cleanly as an R package. So still highly incomplete. Requires a C++17 compiler. Expect changes. But please feel free to contribute.
Any problems, bug reports, or features requests for the package can be submitted and handled most conveniently as Github issues in the repository.
For standard JSON work on R, as well as for other nicely done C++ libraries, consider these:
- jsonlite by Jeroen Ooms is excellent, very versatile, and probably most-widely used;
- rapidjsonr and jsonify by David Cooley bringing RapidJSON to R;
- ndjson by Bob Rudis builds on the JSON for Modern C++ library by Niels Lohmann;
- RJSONIO by Duncan Temple Lang started all this but could use a little love.
For the R package wrapper, Dirk Eddelbuettel.
Functions in RcppSimdJson
|validateJSON||Validate a JSON file, fast|
Last month downloads
|License||GPL (>= 2)|
|Packaged||2020-01-25 23:02:02.1361 UTC; edd|
|Date/Publication||2020-02-12 10:10:02 UTC|
Include our badge in your README