Learn R Programming

ggOceanMaps (version 1.1)

dist2land: Calculate distance to the closest land for coordinates in a data frame

Description

Calculates the closest distance to land for coordinates in a data frame

Usage

dist2land(
  data,
  lon = NULL,
  lat = NULL,
  shapefile = NULL,
  proj.in = "+init=epsg:4326",
  bind = TRUE,
  dist.col = "ldist",
  binary = FALSE,
  cores = getCores(),
  verbose = TRUE
)

Arguments

data

Data.frame containing geographic coordinates

lon, lat

Either the names of the longitude and latitude columns in data or NULL to guess the longitude and/or latitude columns in data.

shapefile

Land shape to which distances should be calculated. Either a character argument referring to a name of pre-made shapefiles in shapefile_list, a single SpatialPolygons object or NULL to enable automatic definition of the land shapes based on data.

proj.in

proj4string projection argument for the coordinates in data.

bind

Logical indicating whether x should be returned with the distances (TRUE, default) or should the distances be returned as vector (FALSE).

dist.col

The name of the distance column, if bind = TRUE. Defaults to "dist".

binary

Logical indicating whether binary (TRUE = the position is in the ocean, FALSE = the position is on land) should be returned instead of distances. Speeds up the function considerably.

cores

Integer value defining how many cores should be used in the distance calculations. Parallelization speeds up the function (see parallel::mclapply), but naturally eats up computer resources during the calculation. Set to 1 to remove parallelization.

verbose

Logical indicating whether information about the process should be returned as messages. Set to FALSE to make the function silent.

Value

Returns a vector if bind = FALSE, otherwise a data frame. The distances are given in a new column defined by the dist.col argument. The distances are kilometers if binary = FALSE, otherwise logical (TRUE = the position is in the ocean, FALSE = the position is on land).

Details

The function calculates distances using projected coordinates and the rgeos::gDistance function. These distances do not consider the curvature of the Earth unless the projection of the used land shape does so (check out geosphere::dist2Line and this SO solution if you want exact distances). The function is fairly slow for large datasets. If you only want to use the function to remove (wrong) observations reported on land, set the binary argument to TRUE. This speeds up the calculations considerably.

The dist2land function offers parallel processing, which speeds up the calculations for large datasets. Parallel processing has not been tested under Windows yet and may not work.

Examples

Run this code
# NOT RUN {
# Simple example:
dt <- data.frame(lon = seq(-20, 80, length.out = 41), lat = 50:90)
dt <- dist2land(dt, cores = 1)
qmap(dt, color = ldist) + scale_color_viridis_c()

# No premade shapefiles for datasets covering the entire globe
data.frame(lon = -20:20, lat = seq(-90, 90, length.out = 41))
dist2land(dt, cores = 1) # wrong!
# }
# NOT RUN {
dt <- data.frame(lon = seq(-179, 179, length.out = 1000), lat = rep(60, 1000))
# The distance calculation is slow for large datasets
system.time(dist2land(dt))
#> user  system elapsed 
#> 0.073   0.041   5.627

# The parallel processing speeds it up
system.time(dist2land(dt, cores = 1))
#> user  system elapsed 
#> 19.719   1.237  20.894 

# binary = TRUE further speeds the function up
system.time(dist2land(dt, binary = TRUE))
#> user  system elapsed 
#> 1.624   0.041   1.680
# }

Run the code above in your browser using DataLab