wk
The goal of wk is to provide lightweight R and C++ infrastructure for packages to use well-known formats (well-known binary and well-known text) as input and/or output without requiring external software. Well-known binary is very fast to read and write, whereas well-known text is human-readable and human-writable. Together, these formats allow for efficient interchange between software packages (WKB), and highly readable tests and examples (WKT).
Installation
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("paleolimbot/wk")
If you can load the package, you’re good to go!
library(wk)
Basic vector classes for WKT and WKB
Use wkt()
to mark a character vector as containing well-known text, or
wkb()
to mark a vector as well-known binary. These have some basic
vector features built in, which means you can subset, repeat,
concatenate, and put these objects in a data frame or tibble. These come
with built-in format()
and print()
methods.
wkt("POINT (30 10)")
#> <wk_wkt[1]>
#> [1] POINT (30 10)
as_wkb(wkt("POINT (30 10)"))
#> <wk_wkb[1]>
#> [1] <POINT (30 10)>
Extract coordinates and meta information
One of the main drawbacks to passing around geomtries in WKB is that the
format is opaque to R users, who need coordinates as R object rather
than binary vectors. In addition to print()
and plot()
methods for
wkb()
vectors, the wk*_meta()
and wk*_coords()
functions provide
usable coordinates and feature meta.
wkt_coords("POINT ZM (1 2 3 4)")
#> feature_id part_id ring_id x y z m
#> 1 1 1 0 1 2 3 4
wkt_meta("POINT ZM (1 2 3 4)")
#> feature_id part_id type_id size srid has_z has_m n_coords
#> 1 1 1 1 1 NA TRUE TRUE 1
Well-known R objects?
The wk package experimentally generates (and parses) well-known “s” expressions (the C name for R objects). This is similar to the format that sf uses.
wkt_translate_wksxp("POINT (30 10)")
#> [[1]]
#> [,1] [,2]
#> [1,] 30 10
#> attr(,"class")
#> [1] "wk_point"
Dependencies
The wk package imports Rcpp.
Using the C++ headers
The wk package takes an event-based approach to parsing inspired by the
event-based SAX XML parser. This makes the readers and writers highly
re-usable! This system is class-based, so you will have to make your
own subclass of WKGeometryHandler
and wire it up to a WKReader
to do
anything useful.
// If you're writing code in a package, you'll also
// have to put 'wk' in your `LinkingTo:` description field
// [[Rcpp::depends(wk)]]
#include <Rcpp.h>
#include "wk/rcpp-io.h"
#include "wk/wkt-reader.h"
using namespace Rcpp;
class CustomHandler: public WKGeometryHandler {
public:
void nextFeatureStart(size_t featureId) {
Rcout << "Do something before feature " << featureId << "\n";
}
void nextFeatureEnd(size_t featureId) {
Rcout << "Do something after feature " << featureId << "\n";
}
};
// [[Rcpp::export]]
void wkt_read_custom(CharacterVector wkt) {
WKCharacterVectorProvider provider(wkt);
WKTReader reader(provider);
CustomHandler handler;
reader.setHandler(&handler);
while (reader.hasNextFeature()) {
reader.iterateFeature();
}
}
On our example point, this prints the following:
wkt_read_custom("POINT (30 10)")
#> Do something before feature 0
#> Do something after feature 0
The full handler interface includes methods for the start and end of
features, geometries (which may be nested), linear rings, coordinates,
and parse errors. You can preview what will get called for a given
geometry using wkb|wkt_debug()
functions.
wkt_debug("POINT (30 10)")
#> nextFeatureStart(0)
#> nextGeometryStart(POINT [1], WKReader::PART_ID_NONE)
#> nextCoordinate(POINT [1], WKCoord(x = 30, y = 10), 0)
#> nextGeometryEnd(POINT [1], WKReader::PART_ID_NONE)
#> nextFeatureEnd(0)
Performance
This package was designed to stand alone and be flexible, but also happens to be really fast for some common operations.
Read WKB + Write WKB:
bench::mark(
wk = wk:::wksxp_translate_wkb(wk:::wkb_translate_wksxp(nc_wkb)),
geos_c = geovctrs:::geovctrs_cpp_convert(nc_wkb, wkb_ptype),
sf = sf:::CPL_read_wkb(sf:::CPL_write_wkb(nc_sfc, EWKB = TRUE), EWKB = TRUE),
check = FALSE
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 wk 273µs 333µs 2966. 114.2KB 18.1
#> 2 geos_c 498µs 551µs 1780. 53.5KB 2.03
#> 3 sf 370µs 413µs 2338. 99.8KB 16.6
Read WKB + Write WKT:
bench::mark(
wk = wk:::wkb_translate_wkt(nc_wkb),
geos_c = geovctrs:::geovctrs_cpp_convert(nc_wkb, wkt_ptype),
sf = sf:::st_as_text.sfc(sf:::st_as_sfc.WKB(nc_WKB, EWKB = TRUE)),
wellknown = lapply(nc_wkb, wellknown::wkb_wkt),
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 4 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 wk 3.03ms 3.22ms 308. 3.32KB 0
#> 2 geos_c 4ms 4.36ms 228. 3.32KB 0
#> 3 sf 180.12ms 184.4ms 5.28 569.81KB 21.1
#> 4 wellknown 25.42ms 28.76ms 33.4 3.41MB 5.89
Read WKT + Write WKB:
bench::mark(
wk = wk:::wkt_translate_wkb(nc_wkt),
geos_c = geovctrs:::geovctrs_cpp_convert(nc_wkt, wkb_ptype),
sf = sf:::CPL_write_wkb(sf:::st_as_sfc.character(nc_wkt), EWKB = TRUE),
wellknown = lapply(nc_wkt, wellknown::wkt_wkb),
check = FALSE
)
#> # A tibble: 4 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 wk 1.89ms 2.02ms 492. 53.58KB 0
#> 2 geos_c 2.5ms 2.73ms 362. 49.48KB 0
#> 3 sf 3.32ms 3.58ms 274. 186.48KB 6.41
#> 4 wellknown 42.73ms 45.88ms 21.9 1.31MB 12.5
Read WKT + Write WKT:
bench::mark(
wk = wk::wksxp_translate_wkt(wk::wkt_translate_wksxp(nc_wkt)),
geos_c = geovctrs:::geovctrs_cpp_convert(nc_wkt, wkt_ptype),
sf = sf:::st_as_text.sfc(sf:::st_as_sfc.character(nc_wkt)),
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 wk 5.03ms 5.49ms 178. 63.77KB 0
#> 2 geos_c 5.99ms 6.34ms 156. 3.32KB 0
#> 3 sf 188.41ms 190.96ms 5.23 230.73KB 20.9
Generate coordinates:
bench::mark(
wk_wkb = wk::wksxp_coords(nc_sxp),
sfheaders = sfheaders::sfc_to_df(nc_sfc),
sf = sf::st_coordinates(nc_sfc),
check = FALSE
)
#> # A tibble: 3 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 wk_wkb 160.84µs 185.1µs 5010. 131KB 29.9
#> 2 sfheaders 502.16µs 544.86µs 1772. 612KB 56.1
#> 3 sf 2.13ms 2.32ms 426. 606KB 33.7
Send polygons to a graphics device (note that the graphics device is the main holdup in real life):
devoid::void_dev()
wksxp_plot_new(nc_sxp)
bench::mark(
wk_wkb = wk::wksxp_draw_polypath(nc_sxp),
sf = sf:::plot.sfc_MULTIPOLYGON(nc_sfc, add = TRUE),
check = FALSE
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 wk_wkb 312.7µs 336.4µs 2769. 358KB 18.2
#> 2 sf 3.25ms 3.44ms 283. 241KB 20.7
dev.off()
#> quartz_off_screen
#> 2