Learn R Programming

EcoCleanR (version 1.0.3)

ec_rm_duplicate_occurid: Remove Duplicate Records from the Merged Data based on occurrenceID

Description

Remove Duplicate Records from the Merged Data based on occurrenceID

Usage

ec_rm_duplicate_occurid(
  data,
  occurrenceID = "occurrenceID",
  abundance = "abundance"
)

Value

A data frame which has unique occurrenceID. the output file will have cleaned_occurrenceID field instead of occurrenceID. Also the unique record will be chosen with the abundance value if there is any.

Arguments

data

this is merge data frame which is a output file after running ec_db_merge

occurrenceID

this is a mandatory field which consider unique for each occurrence record.

abundance

this is a mandatory field which has created while data extraction by combining individual count and quantity fields (may vary from one source to another, we aim to standardize those as "abundance").

Details

This function will provide a cleaned_occurrenceID column as output, which has occurrenceID standardize and removed duplicates based on generated cleaned_occurrenceID and abundance columns of data. mandatory fields are occurrenceID, source and abundance

Examples

Run this code

db1 <- data.frame(
  species = "A",
  decimalLongitude = c(-120.2, -117.1, NA, NA),
  decimalLatitude = c(20.2, 34.1, NA, NA),
  occurrenceID = c("12345", "898828", "LACM8289", "SDNHM6276"),
  occurrenceStatus = c("present", "", "ABSENT", "Present"),
  basisOfRecord = c("preserved_specimen", "", "fossilspecimen", "material_sample"),
  source = "db1",
  abundance = c(1, NA, 8, 23)
)

db2 <- data.frame(
  species = "A",
  decimalLongitude = c(-120.2, -117.1, NA, NA),
  decimalLatitude = c(20.2, 34.1, NA, NA),
  occurrenceID = c("12345", "898828", "LACM82898", "SDNHM62767"),
  occurrenceStatus = c("present", "", "ABSENT", "Present"),
  basisOfRecord = c("preserved_specimen", "", "fossilspecimen", "material_sample"),
  source = "db2",
  abundance = c(1, 2, 3, 19)
)
db_list <- list(db1, db2)
merge_modern_data <- ec_db_merge(
  db_list = db_list, "modern"
)
ecodata <- ec_rm_duplicate_occurid(
  merge_modern_data,
  occurrenceID = "occurrenceID",
  abundance = "abundance"
)

Run the code above in your browser using DataLab