Removes or flags records that are temporal outliers based on interquantile ranges.
cf_age(
x,
lon = "decimallongitude",
lat = "decimallatitude",
min_age = "min_ma",
max_age = "max_ma",
taxon = "accepted_name",
method = "quantile",
size_thresh = 7,
mltpl = 5,
replicates = 5,
flag_thresh = 0.5,
uniq_loc = FALSE,
value = "clean",
verbose = TRUE
)
data.frame. Containing fossil records with taxon names, ages, and geographic coordinates.
character string. The column with the longitude coordinates.
To identify unique records if uniq_loc = TRUE
.
Default = “decimallongitude”.
character string. The column with the longitude coordinates.
Default = “decimallatitude”. To identify unique records if uniq_loc = T
.
character string. The column with the minimum age. Default = “min_ma”.
character string. The column with the maximum age. Default = “max_ma”.
character string. The column with the taxon name. If “”, searches for outliers over the entire dataset, otherwise per specified taxon. Default = “accepted_name”.
character string. Defining the method for outlier selection. See details. Either “quantile” or “mad”. Default = “quantile”.
numeric. The minimum number of records needed for a dataset to be tested. Default = 10.
numeric. The multiplier of the interquartile range
(method == 'quantile'
) or median absolute deviation (method ==
'mad'
) to identify outliers. See details. Default = 5.
numeric. The number of replications for the distance matrix calculation. See details. Default = 5.
numeric. The fraction of passed replicates necessary to pass the test. See details. Default = 0.5.
logical. If TRUE only single records per location and time
point (and taxon if taxon
!= "") are used for the outlier testing.
Default = T.
character string. Defining the output value. See value.
logical. If TRUE reports the name of the test and the number of records flagged.
Depending on the ‘value’ argument, either a data.frame
containing the records considered correct by the test (“clean”) or a
logical vector (“flagged”), with TRUE = test passed and FALSE = test failed/potentially
problematic . Default = “clean”.
The outlier detection is based on an interquantile range test. A temporal
distance matrix among all records is calculated based on a single point selected by random
between the minimum and maximum age for each record. The mean distance for
each point to all neighbours is calculated and the sum of these distances
is then tested against the interquantile range and flagged as an outlier if
# NOT RUN {
minages <- c(runif(n = 11, min = 10, max = 25), 62.5)
x <- data.frame(species = c(letters[1:10], rep("z", 2)),
min_ma = minages,
max_ma = c(minages[1:11] + runif(n = 11, min = 0, max = 5), 65))
cf_age(x, value = "flagged", taxon = "")
# unique locations only
x <- data.frame(species = c(letters[1:10], rep("z", 2)),
decimallongitude = c(runif(n = 10, min = 4, max = 16), 75, 7),
decimallatitude = c(runif(n = 12, min = -5, max = 5)),
min_ma = minages,
max_ma = c(minages[1:11] + runif(n = 11, min = 0, max = 5), 65))
cf_age(x, value = "flagged", taxon = "", uniq_loc = TRUE)
# }
Run the code above in your browser using DataLab