Learn R Programming

gatoRs (version 1.0.0)

full_clean: Full Cleaning - Wrapper function to speed clean

Description

The full_clean() function performs automated cleaning steps, including options for: removing duplicate data points, checking locality precision, removing points with skewed coordinates, removing plain zero records, removing records based on basis of record, and spatially thinning collection points. This function also provides the option to interactively inspect and remove types of basis of record.

Usage

full_clean(
  df,
  synonyms.list,
  event.date = "eventDate",
  year = "year",
  month = "month",
  day = "day",
  occ.id = "occurrenceID",
  remove.NA.occ.id = FALSE,
  remove.NA.date = FALSE,
  aggregator = "aggregator",
  id = "ID",
  taxa.filter = "fuzzy",
  scientific.name = "scientificName",
  accepted.name = NA,
  remove.zero = TRUE,
  precision = TRUE,
  digits = 2,
  remove.skewed = TRUE,
  basis.list = NA,
  basis.of.record = "basisOfRecord",
  latitude = "latitude",
  longitude = "longitude",
  remove.flagged = TRUE,
  thin.points = TRUE,
  distance = 5,
  reps = 100,
  one.point.per.pixel = TRUE,
  raster = NA,
  resolution = 0.5
)

Value

df is a data frame with the cleaned data.

Arguments

df

Data frame of occurrence records.

synonyms.list

A list of synonyms for a species.

event.date

Default = "eventDate". The name of the event date column in the data frame.

year

Default = "year". The name of the event date year column in the data frame.

month

Default = "month". The name of the event date month column in the data frame.

day

Default = "day". The name of the event date day column in the data frame.

occ.id

Default = "occurrenceId". The name of the occurrence ID column in the data frame.

remove.NA.occ.id

Default = FALSE. This will remove records with missing occurrence IDs when set to TRUE.

remove.NA.date

Default = FALSE. This will remove records with missing event dates when set to TRUE.

aggregator

Default = "aggregator". The name of the column in the data frame that identifies the aggregator that provided the record.

id

Default = "ID". The name of the id column in the data frame, which contains unique IDs defined from GBIF or iDigBio.

taxa.filter

The type of filter to be used--either "exact", "fuzzy", or "interactive".

scientific.name

Default = "scientificName". The name of the scientific name column in the data frame.

accepted.name

The accepted scientific name for the species. If provided, an additional column will be added to the data frame with the accepted name for further manual comparison.

remove.zero

Default = TRUE. Indicates that points at (0.00, 0.00) should be removed.

precision

Default = TRUE. Indicates that coordinates should be rounded to match the coordinate uncertainty.

digits

Default = 2. Indicates digits to round coordinates to when precision = TRUE.

remove.skewed

Default = TRUE. Utilizes the remove_skewed() function to remove skewed coordinate values.

basis.list

A list of basis to keep. If a list is not supplied, this filter will not occur.

basis.of.record

Default = "basisOfRecord". The name of the basis of record column in the data frame.

latitude

Default = "latitude". The name of the latitude column in the data frame.

longitude

Default = "longitude". The name of the longitude column in the data frame.

remove.flagged

Default = TRUE. An option to remove points with problematic locality information.

thin.points

Default = TRUE. An option to spatially thin occurrence records.

distance

Default = 5. Distance in km to separate records.

reps

Default = 100. Number of times to perform thinning algorithm.

one.point.per.pixel

Default = TRUE. An option to only retain one point per pixel.

raster

Raster object which will be used for ecological niche comparisons.

resolution

Default = 0.5. Options - 0.5, 2.5, 5, and 10 (in min of a degree). 0.5 min of a degree is equal to 30 arc sec.

Details

This function requires packages dplyr, magrittr, and raster.

Examples

Run this code
cleaned_data <- full_clean(data, synonyms.list = c("Galax urceolata", "Galax aphylla"),
digits = 3, basis.list = c("Preserved Specimen","Physical specimen"),
accepted.name = "Galax urceolata", remove.flagged = FALSE)

Run the code above in your browser using DataLab