tidyverse: Tidyverse methods for sf objects

Description

Tidyverse methods for sf objects. Geometries are sticky, use as.data.frame to let dplyr's own methods drop them.

Usage

filter.sf(.data, ..., .dots)
arrange.sf(.data, ..., .dots)
group_by.sf(.data, ..., add = FALSE)
ungroup.sf(x, ...)
mutate.sf(.data, ..., .dots)
transmute.sf(.data, ..., .dots)
select.sf(.data, ...)
rename.sf(.data, ...)
slice.sf(.data, ..., .dots)
summarise.sf(.data, ..., .dots, do_union = TRUE)
distinct.sf(.data, ..., .keep_all = FALSE)
gather.sf(data, key, value, ..., na.rm = FALSE, convert = FALSE,
  factor_key = FALSE)
spread.sf(data, key, value, fill = NA, convert = FALSE, drop = TRUE,
  sep = NULL)
sample_n.sf(tbl, size, replace = FALSE, weight = NULL,
  .env = parent.frame())
sample_frac.sf(tbl, size = 1, replace = FALSE, weight = NULL,
  .env = parent.frame())
nest.sf(data, ..., .key = "data")
separate.sf(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
  convert = FALSE, extra = "warn", fill = "warn", ...)
unite.sf(data, col, ..., sep = "_", remove = TRUE)
unnest.sf(data, ..., .preserve = NULL)
inner_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
left_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
right_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
full_join.sf(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)
semi_join.sf(x, y, by = NULL, copy = FALSE, ...)
anti_join.sf(x, y, by = NULL, copy = FALSE, ...)

Arguments

.data

data object of class sf

...

other arguments

.dots

see corresponding function in package dplyr

add

see corresponding function in dplyr

tbls to join

do_union

logical; should geometries be unioned by using st_union, or simply be combined using st_combine? Using st_union resolves internal boundaries, but in case of unioning points may also change the order of the points; see Details.

.keep_all

see corresponding function in dplyr

data

see original function docs

key

see original function docs

value

see original function docs

na.rm

see original function docs

convert

see original function docs

factor_key

see original function docs

fill

see original function docs

drop

see original function docs

sep

see original function docs

tbl

see original function docs

size

see original function docs

replace

see original function docs

weight

see original function docs

.env

see original function docs

.key

see nest

col

see separate

into

see separate

remove

see separate

extra

see separate

.preserve

see unnest

tbls to join

a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join).

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

Details

select keeps the geometry regardless whether it is selected or not; to deselect it, first pipe through as.data.frame to let dplyr's own select drop it.

In case do_union is FALSE, summarise will simply combine geometries using c.sfg. When polygons sharing a boundary are combined, this leads to geometries that are invalid; see https://github.com/r-spatial/sf/issues/681.

distinct.sf gives distinct records for which all attributes and geometries are distinct; st_equals is used to find out which geometries are distinct.

nest.sf assumes that a simple feature geometry list-column was among the columns that were nested.

Examples

Run this code

# NOT RUN {
library(dplyr)
nc = st_read(system.file("shape/nc.shp", package="sf"))
nc %>% filter(AREA > .1) %>% plot()
# plot 10 smallest counties in grey:
st_geometry(nc) %>% plot()
nc %>% select(AREA) %>% arrange(AREA) %>% slice(1:10) %>% plot(add = TRUE, col = 'grey')
title("the ten counties with smallest area")
nc$area_cl = cut(nc$AREA, c(0, .1, .12, .15, .25))
nc %>% group_by(area_cl) %>% class()
nc2 <- nc %>% mutate(area10 = AREA/10)
nc %>% transmute(AREA = AREA/10, geometry = geometry) %>% class()
nc %>% transmute(AREA = AREA/10) %>% class()
nc %>% select(SID74, SID79) %>% names()
nc %>% select(SID74, SID79, geometry) %>% names()
nc %>% select(SID74, SID79) %>% class()
nc %>% select(SID74, SID79, geometry) %>% class()
nc2 <- nc %>% rename(area = AREA)
nc %>% slice(1:2)
nc$area_cl = cut(nc$AREA, c(0, .1, .12, .15, .25))
nc.g <- nc %>% group_by(area_cl)
nc.g %>% summarise(mean(AREA))
nc.g %>% summarise(mean(AREA)) %>% plot(col = grey(3:6 / 7))
nc %>% as.data.frame %>% summarise(mean(AREA))
nc[c(1:100, 1:10), ] %>% distinct() %>% nrow()
library(tidyr)
nc %>% select(SID74, SID79) %>% gather(VAR, SID, -geometry) %>% summary()
library(tidyr)
nc$row = 1:100 # needed for spread to work
nc %>% select(SID74, SID79, geometry, row) %>%
	gather(VAR, SID, -geometry, -row) %>%
	spread(VAR, SID) %>% head()
storms.sf = st_as_sf(storms, coords = c("long", "lat"), crs = 4326)
x <- storms.sf %>% group_by(name, year) %>% nest
trs = lapply(x$data, function(tr) st_cast(st_combine(tr), "LINESTRING")[[1]]) %>% st_sfc(crs = 4326)
trs.sf = st_sf(x[,1:2], trs)
plot(trs.sf["year"], axes = TRUE)
# }

Run the code above in your browser using DataLab