Learn R Programming

apache.sedona

Apache Sedona is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines.

The apache.sedona R package exposes an interface to Apache Sedona through {sparklyr} enabling higher-level access through a {dplyr} backend and familiar R functions.

Installation

To use Apache Sedona from R, you just need to install the apache.sedona package; Spark dependencies are managed directly by the package.

# Install released version from CRAN
install.packages("apache.sedona")

Development version

To use the development version, you will need both the latest version of the package and of the Apache Sedona jars.

To get the latest R package from GtiHub:

# Install development version from GitHub
devtools::install_github("apache/sedona/R")

To get the latest Sedona jars you can:

The path to the sedona-spark-shaded jars needs to be put in the SEDONA_JAR_FILES environment variables (see below).

Usage

spark_read_* functions will read geospatial data into Spark Dataframes. The resulting Spark dataframe object can then be modified using dplyr verbs familiar to many R users. In addition, spatial UDFs supported by Sedona can inter-operate seamlessly with other functions supported in sparklyr’s dbplyr SQL translation env. For example, the code below finds the average area of all polygons in polygon_sdf:

The first time you load Sedona, Spark will download all the dependent jars, which can take a few minutes and cause the connection to timeout. You can either retry (some jars will already be downloaded and cached) or increase the "sparklyr.connect.timeout" parameter in the sparklyr config.

library(sparklyr)
library(apache.sedona)

## Only if using development version:
Sys.setenv("SEDONA_JAR_FILES" = "<path to sedona-spark-shaded jar>")

sc <- spark_connect(master = "local")
polygon_sdf <- spark_read_geojson(sc, location = "/tmp/polygon.json")
mean_area_sdf <- polygon_sdf %>%
  dplyr::summarize(mean_area = mean(ST_Area(geometry)))
print(mean_area_sdf)

Notice that all of the above can open up many interesting possibilities. For example, one can extract ML features from geospatial data in Spark dataframes, build a ML pipeline using ml_* family of functions in {sparklyr} to work with such features, and if the output of a ML model happens to be a geospatial object as well, one can even apply visualization routines in {apache.sedona} to visualize the difference between any predicted geometry and the corresponding ground truth.

Copy Link

Version

Install

install.packages('apache.sedona')

Monthly Downloads

930

Version

1.7.2

License

Apache License 2.0

Issues

Pull Requests

Stars

Forks

Maintainer

Apache Sedona

Last Published

June 8th, 2025

Functions in apache.sedona (1.7.2)

sedona_render_scatter_plot

Visualize a Sedona spatial RDD using a scatter plot.
sedona_spatial_join_count_by_key

Perform a spatial count-by-key operation based on two Sedona spatial RDDs.
sedona_render_choropleth_map

Visualize a Sedona spatial RDD using a choropleth map.
sedona_read_geojson

Read geospatial data into a Spatial RDD
sedona_read_shapefile_to_typed_rdd

(Deprecated) Create a typed SpatialRDD from a shapefile or geojson data source.
sedona_spatial_rdd_aggregation_routine

Spatial RDD aggregation routine
sedona_spatial_rdd_data_source

Create a SpatialRDD from an external data source.
sedona_visualization_routines

Visualization routine for Sedona spatial RDD.
spark_read_shapefile

Read geospatial data into a Spark DataFrame.
sedona_write_wkb

Write SpatialRDD into a file.
spatial_query

Execute a spatial query
to_spatial_rdd

Export a Spark SQL query with a spatial column into a Sedona spatial RDD.
spark_write_geojson

Write geospatial data from a Spark DataFrame.
spatial_join_op

Spatial join operator
sedona_apply_spatial_partitioner

Apply a spatial partitioner to a Sedona spatial RDD.
sdf_register.spatial_rdd

Import data from a spatial RDD into a Spark Dataframe.
new_bounding_box

Construct a bounding box object.
sedona_knn_query

Query the k nearest spatial objects.
minimum_bounding_box

Find the minimal bounding box of a geometry.
crs_transform

Perform a CRS transformation.
apache.sedona-package

apache.sedona: R Interface for Apache Sedona
sedona_range_query

Execute a range query.
sedona_build_index

Build an index on a Sedona spatial RDD.
approx_count

Find the approximate total number of records within a Spatial RDD.
sedona_read_dsv_to_typed_rdd

Create a typed SpatialRDD from a delimiter-separated values data source.
sedona_save_spatial_rdd

Save a Spark dataframe containing exactly 1 spatial column into a file.
sedona_render_heatmap

Visualize a Sedona spatial RDD using a heatmap.
sedona_spatial_join

Perform a spatial join operation on two Sedona spatial RDDs.