Learn R Programming

caretSDM (version 1.1.0.1)

data_clean: Presence data cleaning routine

Description

Data cleaning wrapper using CoordinateCleaner package.

Usage

data_clean(occ, pred = NULL,
           species = NA, lon = NA, lat = NA,
           capitals = TRUE,
           centroids = TRUE,
           duplicated = TRUE,
           identical = TRUE,
           institutions = TRUE,
           invalid = TRUE,
           terrestrial = TRUE,
           independent_test = TRUE)

Value

A occurrences_sdm object or input_sdm with cleaned presence data.

Arguments

occ

A occurrences_sdm object or input_sdm.

pred

A sdm_area object. If occ is a input_sdm object with predictors data, than pred is obtained from it.

species

A character stating the name of the column with species names in occ (see details).

lon

A character stating the name of the column with longitude in occ (see details).

lat

A character stating the name of the column with latitude in occ (see details).

capitals

Boolean to turn on/off the exclusion from countries capitals coordinates (see ?cc_cap)

centroids

Boolean to turn on/off the exclusion from countries centroids coordinates (see ?cc_cen)

duplicated

Boolean to turn on/off the exclusion from duplicated records (see ?cc_dupl)

identical

Boolean to turn on/off the exclusion from records with identical lat/long values (see ?cc_equ)

institutions

Boolean to turn on/off the exclusion from biodiversity institutions coordinates (see ?cc_inst)

invalid

Boolean to turn on/off the exclusion from invalid coordinates (see ?cc_val)

terrestrial

Boolean to turn on/off the exclusion from coordinates falling on sea (see ?cc_sea)

independent_test

Boolean. If occ has independent test data, the data cleaning routine is also applied on it.

Author

Luíz Fernando Esser (luizesser@gmail.com) https://luizfesser.wordpress.com

Details

If the user does not used GBIF_data function to obtain species records, the function may have problems to find which column from the presences table has species, longitude and latitude information. In this regard, we implemented the parameters species, lon and lat so the use can explicitly inform which columns should be used. If they remain as NA (standard) the function will try to guess which columns are the correct one.

See Also

GBIF_data occurrences_sdm sdm_area input_sdm predictors

Examples

Run this code
# Create sdm_area object:
sa <- sdm_area(parana, cell_size = 50000, crs = 6933)

# Include predictors:
sa <- add_predictors(sa, bioc) |> select_predictors(c("bio1", "bio12"))

# Create occurrences:
oc <- occurrences_sdm(occ, crs = 6933) |> join_area(sa)

# Create input_sdm:
i <- input_sdm(oc, sa)

# Clean coordinates (terrestrial is set to false to make the run quicker):
i <- data_clean(i, terrestrial = FALSE)

Run the code above in your browser using DataLab