remove_duplicates: Remove Duplicates - Remove records with identical event dates and coordinates

Description

The remove_duplicates() function removes records with identical event dates and occurrence IDs. Prior to utilizing this function, longitude and latitude columns should be rounded to match the coordinate uncertainty using the basic_locality_clean() function.

Usage

remove_duplicates(
  df,
  event.date = "eventDate",
  aggregator = "aggregator",
  id = "ID",
  occ.id = "occurrenceID",
  year = "year",
  month = "month",
  day = "day",
  latitude = "latitude",
  longitude = "longitude",
  remove.NA.occ.id = FALSE,
  remove.NA.date = FALSE,
  remove.unparseable = FALSE
)

Value

Return data frame with duplicates removed.

Arguments

df: Data frame of occurrence records returned from gators_download().
event.date: Default = "eventDate". The name of the event date column in the data frame.
aggregator: Default = "aggregator". The name of the column in the data frame that identifies the aggregator that provided the record.
id: Default = "ID". The name of the id column in the data frame, which contains unique IDs defined from GBIF or iDigBio.
occ.id: Default = "occurrenceId". The name of the occurrence ID column in the data frame.
year: Default = "year". The name of the event date year column in the data frame.
month: Default = "month". The name of the event date month column in the data frame.
day: Default = "day". The name of the event date day column in the data frame.
latitude: Default = "latitude". The name of the latitude column in the data frame.
longitude: Default = "longitude". The name of the longitude column in the data frame.
remove.NA.occ.id: Default = FALSE. This will remove records with missing occurrence IDs when set to TRUE.
remove.NA.date: Default = FALSE. This will remove records with missing event dates when set to TRUE.
remove.unparseable: Default = FALSE. If we cannot parse the event date into individual year, month, day categories the user can manually specify. Otherwise, if set to TRUE, these rows will simply be removed.

Details

This function requires the parsedate and dplyr packages. This function will ignore missing occurrence ID and year, month, date columns if not provided in the data set.

Examples

Run this code

cleaned_data <- remove_duplicates(data)
cleaned_data <- remove_duplicates(data, remove.NA.occ.id = TRUE, remove.NA.date = TRUE)
cleaned_data <- remove_duplicates(data, remove.unparseable = TRUE)

Run the code above in your browser using DataLab