aoa: Area of Applicability

Description

This function estimates the Dissimilarity Index (DI) and the derived Area of Applicability (AOA) of spatial prediction models by considering the distance of new data (i.e. a Raster Stack of spatial predictors used in the models) in the predictor variable space to the data used for model training. Predictors can be weighted in the ideal case based on the internal variable importance of the machine learning algorithm used for model training.

Usage

aoa(train, predictors, weight = NA, model = NA, variables = "all",
  thres = 0.95, folds = NULL)

Arguments

train

a data.frame containing the data used for model training

predictors

A RasterStack, RasterBrick or data.frame containing the data the model was meant to make predictions for.

weight

A data.frame containing weights for each variable. Only required if no model is given.

model

A train object created with caret used to extract weights from (based on variable importance) as well as cross-validation folds

variables

character vector of predictor variables. if "all" then all variables of the train dataset are used. Check varImp(model).

thres

numeric vector of probability of DI in training data, with values in [0,1].

folds

Numeric or character. Folds for cross validation. E.g. Spatial cluster affiliation for each data point. Should be used if replicates are present. Only required if no model is given.

Value

A RasterStack or data.frame with the DI and AOA. AOA has values 0 (outside AOA) and 1 (inside AOA).

Details

The Dissimilarity Index (DI) and the corresponding Area of Applicability (AOA) are calculated. Interpretation of results: If a location is very similar to the properties of the training data it will have a low distance in the predictor variable space (DI towards 0) while locations that are very different in their properties will have a high DI. To get the AOA, a threshold to the DI is applied based on the DI in the training data. To calculate the DI in the training data, the minimum distance to an other training point (if applicable: not located in the same CV fold) is considered. See Meyer and Pebesma (2020) for the full documentation of the methodology.

References

Meyer, H., Pebesma, E. (2020): Predicting into unknown space? Estimating the area of applicability of spatial prediction models. https://arxiv.org/abs/2005.07939

Examples

Run this code

# NOT RUN {
library(sf)
library(raster)
library(caret)
library(viridis)
library(latticeExtra)

# prepare sample data:
dat <- get(load(system.file("extdata","Cookfarm.RData",package="CAST")))
dat <- aggregate(dat[,c("VW","Easting","Northing")],by=list(as.character(dat$SOURCEID)),mean)
pts <- st_as_sf(dat,coords=c("Easting","Northing"))
pts$ID <- 1:nrow(pts)
set.seed(100)
pts <- pts[1:30,]
studyArea <- stack(system.file("extdata","predictors_2012-03-25.grd",package="CAST"))[[1:8]]
trainDat <- extract(studyArea,pts,df=TRUE)
trainDat <- merge(trainDat,pts,by.x="ID",by.y="ID")

# visualize data spatially:
spplot(scale(studyArea))
plot(studyArea$DEM)
plot(pts[,1],add=TRUE,col="black")

# first calculate the DI based on a set of variables with equal weights:
variables <- c("DEM","NDRE.Sd","TWI")
AOA <- aoa(trainDat,studyArea,variables=variables)
spplot(AOA$DI, col.regions=viridis(100),main="Applicability Index")
spplot(AOA$AOA,main="Area of Applicability")

# or weight variables based on variable improtance from a trained model:
set.seed(100)
model <- train(trainDat[,which(names(trainDat)%in%variables)],
trainDat$VW,method="rf",importance=TRUE,tuneLength=1,trControl=trainControl(method="cv",number=5))
print(model) #note that this is a quite poor prediction model
prediction <- predict(studyArea,model)
plot(varImp(model,scale=FALSE))
#
AOA <- aoa(trainDat,studyArea,model=model,variables=variables)
spplot(AOA$DI, col.regions=viridis(100),main="Applicability Index")
#plot predictions for the AOA only:
spplot(prediction, col.regions=viridis(100),main="prediction for the AOA")+
spplot(AOA$AOA,col.regions=c("grey","transparent"))
# }

Run the code above in your browser using DataLab