Learn R Programming

stxplore

The goal of stxplore is to explore spatio-temporal data. It can take in either dataframes or stars objects. It is a tool for exploratory data analysis.

Installation

You can install the development version of stxplore from GitHub with:

# install.packages("devtools")
devtools::install_github("sevvandi/stxplore")

A quick introduction

Let’s first explore some spatio-temporal data using dataframes. The dataset NOAA_df_1990 has temperature and precipitation for from 1990 to 1993 for selected locations.

library(stxplore)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data("NOAA_df_1990")

precip <- filter(NOAA_df_1990,
  proc == "Precip" &
  year == 1993)
precip$t <- precip$julian - min(precip$julian) + 1
head(precip)
#>   julian year month day   id    z   proc   lat       lon       date t
#> 1 727930 1993     1   1 3804 0.00 Precip 39.35 -81.43333 1993-01-01 1
#> 2 727931 1993     1   2 3804 0.00 Precip 39.35 -81.43333 1993-01-02 2
#> 3 727932 1993     1   3 3804 0.03 Precip 39.35 -81.43333 1993-01-03 3
#> 4 727933 1993     1   4 3804 0.37 Precip 39.35 -81.43333 1993-01-04 4
#> 5 727934 1993     1   5 3804 0.09 Precip 39.35 -81.43333 1993-01-05 5
#> 6 727935 1993     1   6 3804 0.00 Precip 39.35 -81.43333 1993-01-06 6

The z variable has precipitation in this in dataset. Let’s visualize the mean precipitation over time.


tem <- temporal_means(precip,
                      t_col = 'date',
                      z_col = 'z',
                      id_col = 'id')

autoplot(tem,
         ylab = "Mean Precipitation")

Let’s look at minimum temperature. Let’s first subset the dataset.

tmin <- filter(NOAA_df_1990,
  proc == "Tmin" &
  year == 1993)

Ridgeline plots break up the quantity of interest into several groups and shows its distribution. We use the R package ggridges (Wilke 2021) underneath. In this Ridgeline plot we see how the minimum temperature for higher latitudes is lower.

ridgeline(tmin, group_col = 'lat', z_col = 'z')
#> Warning: `stat(x)` was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(x)` instead.
#> ℹ The deprecated feature was likely used in the stxplore package.
#>   Please report the issue to the authors.
#> Picking joint bandwidth of 3.27

Of course if we group by longitude, there wouldn’t be much difference between the groups. What about the difference over time? Can we see that using Ridgeline plots?

ridgeline(tmin, group_col = 'date', z_col = 'z', num_grps = 12)
#> Picking joint bandwidth of 1.99

Moving on to stars objects

The stars dataset aerosol_australia has aerosol optical thickness over Australia and surrounds for 13 months starting from 2019 December to 2020 December. There were devastating bushfires in Australia during this time. The data was taken from NASA Earth Observations (NEO) website https://neo.gsfc.nasa.gov (“NASA Earth Observations (NEO),” n.d.).

data("aerosol_australia")

aerosol_australia
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>                    Min. 1st Qu. Median     Mean 3rd Qu. Max.  NA's
#> aerosol_thickness     1      19     26 27.06014      33  254 13585
#> dimension(s):
#>      from to offset delta refsys point                    values x/y
#> x       1 70    110     1 WGS 84 FALSE                      NULL [x]
#> y       1 70      0    -1 WGS 84 FALSE                      NULL [y]
#> date    1 13     NA    NA   Date    NA 2019-12-01,...,2020-12-01

temp_means <- temporal_means(aerosol_australia)
autoplot(temp_means)
#> Warning: Removed 6513 rows containing missing values (`geom_line()`).

ridgeline(aerosol_australia, group_dim = 3, num_grps = 13)
#> Picking joint bandwidth of 1.72

We can see that during December and January there aerosols were much higher compared to other months.

The making of stxplore

In 2017, Chris Wikle and Petra Kunhert taught a short course titled An Introduction to Statistics for Spatio-Temporal Data. The course was based on two books:

  1. Statistics for Spatio-Temporal Data by Noel Cressie and Christopher K. Wikle (Cressie and Wikle 2015)
  2. Spatio-Temporal Statistics with R by Christopher K. Wikle, Andrew Zammit-Mangion and Noel Cressie (Wikle, Zammit-Mangion, and Cressie 2019)

There had been some discussions in making a separate package for spatio-temporal exploration. Even though there are many packages for spatio-temporal modelling, exploration is rarely given the spotlight. Many years later, Petra mentioned this to me and I’ve been involved in making it a reality.

References

Cressie, Noel, and Christopher K Wikle. 2015. Statistics for spatio-temporal data. John Wiley & Sons.

“NASA Earth Observations (NEO).” n.d. NASA. https://neo.gsfc.nasa.gov/.

Wikle, Christopher K, Andrew Zammit-Mangion, and Noel Cressie. 2019. Spatio-temporal Statistics with R. Chapman; Hall/CRC.

Wilke, Claus O. 2021. ggridges: Ridgeline Plots in ’ggplot2’. https://CRAN.R-project.org/package=ggridges.

Copy Link

Version

Install

install.packages('stxplore')

Monthly Downloads

144

Version

0.1.0

License

GPL (>= 3)

Maintainer

Sevvandi Kandanaarachchi

Last Published

February 3rd, 2023

Functions in stxplore (0.1.0)

locs

The locations used in the NOAA dataset.
reexports

Objects exported from other packages
emp_spatial_cov

Computes empirical spatial covariance using a dataframe or a stars object
ridgeline

Ridgeline plots grouped by an attribute using a dataframe as an input.
emp_orth_fun

Computes empirical orthogonal functions using a dataframe or a stars object.
semivariogram

Computes the semi-variogram using a dataframe or a stars object.
spatial_means

Computes spatial empirical means using a dataframe or a stars object
spatial_snapshots

Plots spatial snapshots of data through time using a dataframe or a stars object.
temporal_snapshots

Plots temporal snapshots of data for specific spatial locations using a dataframe or a stars object.
stxplore-package

stxplore: Exploration of Spatio-Temporal Data
%>%

Pipe operator
temporal_means

Computes temporal empirical means using a dataframe or a stars object.
cancor_eof

Performs CCA using Empirical Orthogonal Functions (EOFs) from a lagged dataset
NOAA_df_1990

National oceanic and atmospheric administration (NOAA) data from 1990 to 1993
canonical_correlation

Computes transformed variables from Canonical Correlation Analysis using a dataframe or a stars object
hovmoller

Computes the data structure for the Hovmoller plots
SSTdatashort

The data from of the Sea Surface Temperature (SST) dataset. A subset of the original dataset is used.
SSTlonlatshort

The locations of the Sea Surface Temperatures (SST) dataset. A subset of the original dataset is used.
SSTlandmaskshort

The land mask for the Sea Surface Temperature (SST) dataset. A subset of the original dataset is used.
Times

The time period in which the NOAA dataset was recorded. This spans from January 1990 to December 1993.
Tmax

The maximum temperature values used in the NOAA dataset in a wide dataframe format.
aerosol_world

Data from of NASA Earth Observations at https://neo.gsfc.nasa.gov
aerosol_australia

Data from of NASA Earth Observations at https://neo.gsfc.nasa.gov