Evaluate metrics at multiple scales of aggregation
ww_multi_scale(
data = NULL,
truth,
estimate,
metrics = list(yardstick::rmse, yardstick::mae),
grids = NULL,
...,
na_rm = TRUE,
aggregation_function = "mean",
autoexpand_grid = TRUE,
progress = TRUE
)
A tibble with six columns: .metric
, with the name
of the metric that the row describes; .estimator
, with the name of the
estimator used, .estimate
, with the output of the metric function;
.grid_args
, with the arguments passed to sf::st_make_grid()
via ...
(if any), .grid
, containing the grids used to aggregate predictions,
as well as the aggregated values of truth
and estimate
as well as the
count of non-NA values for each, and .notes
, which (if data
is an sf
object) will indicate any observations which were not used in a given assessment.
Either: a point geometry sf
object containing the columns
specified by the truth
and estimate
arguments; a SpatRaster
from
the terra
package containing layers specified by the truth
and estimate
arguments; or NULL
if truth
and estimate
are SpatRaster
objects.
If data
is an sf
object, the names (optionally
unquoted) for the columns in data
containing the true and predicted values,
respectively. If data
is a SpatRaster
object, either (quoted) layer names or
indices which will select the true and predicted layers, respectively, via
terra::subset()
If data
is NULL
, SpatRaster
objects with a single
layer containing the true and predicted values, respectively.
Either a yardstick::metric_set()
object, or a list of
functions which will be used to construct a yardstick::metric_set()
object
specifying the performance metrics to evaluate at each scale.
Optionally, a list of pre-computed sf
or sfc
objects
specifying polygon boundaries to use for assessments.
Arguments passed to sf::st_make_grid()
.
You almost certainly should provide these arguments as lists.
For instance, passing n = list(c(1, 2))
will create a single 1x2 grid;
passing n = c(1, 2)
will create a 1x1 grid and a 2x2 grid.
Boolean: Should polygons with NA values be removed before
calculating metrics? Note that this does not impact how values are
aggregated to polygons: if you want to remove NA values before aggregating,
provide a function to aggregation_function
which will remove NA values.
The function to use to aggregate predictions and
true values at various scales, by default mean()
. For the sf
method,
you can pass any function which takes a single vector and returns a scalar.
For raster methods, any function accepted by
exactextractr::exact_extract()
(note that built-in function names must be
quoted). Note that this function does not pay attention to the value of
na_rm
; any NA handling you want to do during aggregation should be handled
by this function directly.
Boolean: if data
is in geographic coordinates and
grids
aren't provided, the grids generated by sf::st_make_grid()
may not
contain all observations. If TRUE
, this function will automatically expand
generated grids by a tiny factor to attempt to capture all observations.
Boolean: if data
is NULL
, should aggregation via
exactextractr::exact_extract()
show a progress bar? Separate progress bars
will be shown for each time truth
and estimate
are aggregated.
If data
is NULL
, then truth
and estimate
should both be SpatRaster
objects, as created via terra::rast()
. These rasters will then be
aggregated to each grid using exactextractr::exact_extract()
. If data
is a SpatRaster
object, then truth
and estimate
should be indices to
select the appropriate layers of the raster via terra::subset()
.
Grids are calculated using the bounding box of truth
, under the assumption
that you may have extrapolated into regions which do not have matching "true"
values. This function does not check that truth
and estimate
overlap at
all, or that they are at all contained within the grid.
The grid blocks can be controlled by passing arguments to
sf::st_make_grid()
via ...
. Some particularly useful arguments include:
cellsize
: Target cellsize, expressed as the "diameter" (shortest
straight-line distance between opposing sides; two times the apothem)
of each block, in map units.
n
: The number of grid blocks in the x and y direction (columns, rows).
square
: A logical value indicating whether to create square (TRUE
) or
hexagonal (FALSE
) cells.
If both cellsize
and n
are provided, then the number of blocks requested
by n
of sizes specified by cellsize
will be returned, likely not
lining up with the bounding box of data
. If only cellsize
is provided, this function will return as many blocks of size
cellsize
as fit inside the bounding box of data
. If only n
is provided,
then cellsize
will be automatically adjusted to create the requested
number of cells.
Grids are created by mapping over each argument passed via ...
simultaneously, in a similar manner to mapply()
or purrr::pmap()
. This
means that, for example, passing n = list(c(1, 2))
will create a single
1x2 grid, while passing n = c(1, 2)
will create a 1x1 grid and a 2x2
grid. It also means that arguments will be recycled using R's standard
vector recycling rules, so that passing n = c(1, 2)
and square = FALSE
will create two separate grids of hexagons.
This function can be used for geographic or projected coordinate reference systems and expects 2D data.
Riemann, R., Wilson, B. T., Lister, A., and Parks, S. (2010). "An effective assessment protocol for continuous geospatial datasets of forest characteristics using USFS Forest Inventory and Analysis (FIA) data." Remote Sensing of Environment 114(10), pp 2337-2352, doi: 10.1016/j.rse.2010.05.010 .
data(ames, package = "modeldata")
ames_sf <- sf::st_as_sf(ames, coords = c("Longitude", "Latitude"), crs = 4326)
ames_model <- lm(Sale_Price ~ Lot_Area, data = ames_sf)
ames_sf$predictions <- predict(ames_model, ames_sf)
ww_multi_scale(
ames_sf,
Sale_Price,
predictions,
n = list(
c(10, 10),
c(1, 1)
),
square = FALSE
)
# or, mostly equivalently
# (there will be a slight difference due to `autoexpand_grid = TRUE`)
grids <- list(
sf::st_make_grid(ames_sf, n = c(10, 10), square = FALSE),
sf::st_make_grid(ames_sf, n = c(1, 1), square = FALSE)
)
ww_multi_scale(ames_sf, Sale_Price, predictions, grids = grids)
Run the code above in your browser using DataLab