The full_clean() function performs automated cleaning steps, including options for: removing
duplicate data points, checking locality precision, removing points with skewed coordinates,
removing plain zero records, removing records based on basis of record, and spatially thinning collection points.
This function also provides the option to interactively inspect and remove types of basis of record.
full_clean(
df,
synonyms.list,
event.date = "eventDate",
year = "year",
month = "month",
day = "day",
occ.id = "occurrenceID",
remove.NA.occ.id = FALSE,
remove.NA.date = FALSE,
aggregator = "aggregator",
id = "ID",
taxa.filter = "fuzzy",
scientific.name = "scientificName",
accepted.name = NA,
remove.zero = TRUE,
precision = TRUE,
digits = 2,
remove.skewed = TRUE,
basis.list = NA,
basis.of.record = "basisOfRecord",
latitude = "latitude",
longitude = "longitude",
remove.flagged = TRUE,
thin.points = TRUE,
distance = 5,
reps = 100,
one.point.per.pixel = TRUE,
raster = NA,
resolution = 0.5
)df is a data frame with the cleaned data.
Data frame of occurrence records.
A list of synonyms for a species.
Default = "eventDate". The name of the event date column in the data frame.
Default = "year". The name of the event date year column in the data frame.
Default = "month". The name of the event date month column in the data frame.
Default = "day". The name of the event date day column in the data frame.
Default = "occurrenceId". The name of the occurrence ID column in the data frame.
Default = FALSE. This will remove records with missing occurrence IDs when set to TRUE.
Default = FALSE. This will remove records with missing event dates when set to TRUE.
Default = "aggregator". The name of the column in the data frame that identifies the aggregator that provided the record.
Default = "ID". The name of the id column in the data frame, which contains unique IDs defined from GBIF or iDigBio.
The type of filter to be used--either "exact", "fuzzy", or "interactive".
Default = "scientificName". The name of the scientific name column in the data frame.
The accepted scientific name for the species. If provided, an additional column will be added to the data frame with the accepted name for further manual comparison.
Default = TRUE. Indicates that points at (0.00, 0.00) should be removed.
Default = TRUE. Indicates that coordinates should be rounded to match the coordinate uncertainty.
Default = 2. Indicates digits to round coordinates to when precision = TRUE.
Default = TRUE. Utilizes the remove_skewed() function to remove skewed coordinate values.
A list of basis to keep. If a list is not supplied, this filter will not occur.
Default = "basisOfRecord". The name of the basis of record column in the data frame.
Default = "latitude". The name of the latitude column in the data frame.
Default = "longitude". The name of the longitude column in the data frame.
Default = TRUE. An option to remove points with problematic locality information.
Default = TRUE. An option to spatially thin occurrence records.
Default = 5. Distance in km to separate records.
Default = 100. Number of times to perform thinning algorithm.
Default = TRUE. An option to only retain one point per pixel.
Raster object which will be used for ecological niche comparisons.
Default = 0.5. Options - 0.5, 2.5, 5, and 10 (in min of a degree). 0.5 min of a degree is equal to 30 arc sec.
This function requires packages dplyr, magrittr, and raster.
cleaned_data <- full_clean(data, synonyms.list = c("Galax urceolata", "Galax aphylla"),
digits = 3, basis.list = c("Preserved Specimen","Physical specimen"),
accepted.name = "Galax urceolata", remove.flagged = FALSE)
Run the code above in your browser using DataLab