This function generates pseudo-absences from an input data.frame containing latitude and longitude coordinates by using environmental data and then uses both presences and pseudo-absences to train a SVM model used to flag possible outliers for a given species.
outliers.detect(
longlat,
training = NULL,
hi_res = TRUE,
crop = FALSE,
threshold = 0.05,
method = "all"
)list if method = "all", containing whether or not a given point
was classified as TRUE or FALSE along with the confusion matrix
for the training data. If method = "geo" or
method = "env" a data.frame is returned.
data.frame. With two columns containing latitude and longitude, describing the locations of a species, which may contain outliers.
data.frame. With the same formatting as longlat, indicating only known
locations where a target species occurs. Used exclusively as training data for
method 'svm'.
logical. Specifies if 1 KM resolution environmental data should be used.
If FALSE 10 KM resolution data is used instead.
logical. Indicates whether environmental data should be cropped to
an extent similar to what is given in longlat and training. Useful to avoid
large processing times of higher resolutions.
numeric. Value indicating the threshold for classifying
outliers in methods "geo" and "env". E.g.: under the default
of 0.05, points that are at an average distance greater than the 95
of the average distances of all points, will be classified as outliers.
A string specifying the outlier detection method. "geo"
calculates the euclidean distance between point coordinates and classifies as
outliers those outside the 0
"env"
performs the same calculation but instead uses the environmental data extracted
from those points. "svm" will use the dataset given to "longlat" and it corresponding
extracted environmental data to train a support vector machine model that then
predicts outliers.
Environmental data used is WorldClim and requires a long download, see
gecko::gecko.setDir()
This function is heavily based on the methods described in Liu et al. (2017).
There the authors describe SVM_pdSDM, a pseudo-SDM method similar to a
two-class presence only SVM that is capable of using pseudo-absence points,
implemented with the ksvm function in the R package kernlab.
It is suggested that, for each set of "n" occurence
records, "2 * n" pseudo-absences points are generated.
Whilst using it keep in mind works highlighting limitations such as such as
Meynard et al. (2019). See References section.
Liu, C., White, M. and Newell, G. (2017) ‘Detecting outliers in species distribution data’, Journal of Biogeography, 45(1), pp. 164–176. doi:10.1111/jbi.13122.
Meynard, C.N., Kaplan, D.M. and Leroy, B. (2019) ‘Detecting outliers in species distribution data: Some caveats and clarifications on a virtual species study’, Journal of Biogeography, 46(9), pp. 2141–2144. doi:10.1111/jbi.13626.
if (FALSE) {
new_occurences = gecko.data("records")
new_occurences = new_occurences[new_occurences$species == "Hogna maderiana", 2:3]
old_occurences = data.frame(X = runif(10, -17.1, -17.05), Y = runif(10, 32.73, 32.76))
outliers.detect(new_occurences, old_occurences)
}
Run the code above in your browser using DataLab