Learn R Programming

scrubr (version 0.1.1)

coords: Coordinate based cleaning

Description

Coordinate based cleaning

Usage

coord_incomplete(x, lat = NULL, lon = NULL, drop = TRUE)

coord_impossible(x, lat = NULL, lon = NULL, drop = TRUE)

coord_unlikely(x, lat = NULL, lon = NULL, drop = TRUE)

coord_within(x, field = NULL, country = NULL, lat = NULL, lon = NULL, drop = TRUE)

coord_pol_centroids(x, lat = NULL, lon = NULL, drop = TRUE)

Arguments

x

(data.frame) A data.frame

lat, lon

(character) Latitude and longitude column to use. See Details.

drop

(logical) Drop bad data points or not. Either way, we parse out bade data points as an attribute you can access. Default: TRUE

field

(character) Name of filed in input data.frame x with country names

country

(character) A single country name

Value

Returns a data.frame, with attributes

coord_pol_centroids

Right now, this function only deals with city centroids, using the world.cities dataset of more than 40,000 cities. We'll work on adding country centroids, and perhaps others (e.g., counties, states, provinces, parks, etc.).

Details

Explanation of the functions:

  • coord_impossible - Impossible coordinates

  • coord_incomplete - Incomplete coordinaes

  • coord_pol_centroids - Points at political centroids

  • coord_unlikely - Unlikely coordinates

  • coord_within - Check if points are within user input political boundaries

If either lat or lon (or both) given, we assign the given column name to be standardized names of "latitude", and "longitude". If not given, we attempt to guess what the lat and lon column names are and assign the same standardized names. Assigning the same standardized names makes downstream processing easier so that we're dealing with consistent column names. On returning the data, we return the original names.

For coord_within, we use countriesLow dataset from the rworldmap package to get country borders.

Examples

Run this code
# NOT RUN {
df <- sample_data_1

# Remove impossible coordinates
NROW(df)
df[1, "latitude"] <- 170
df <- dframe(df) %>% coord_impossible()
NROW(df)
attr(df, "coord_impossible")

# Remove incomplete cases
NROW(df)
df_inc <- dframe(df) %>% coord_incomplete()
NROW(df_inc)
attr(df_inc, "coord_incomplete")

# Remove unlikely points
NROW(df)
df_unlikely <- dframe(df) %>% coord_unlikely()
NROW(df_unlikely)
attr(df_unlikely, "coord_unlikely")

# Remove points not within correct political borders
if (requireNamespace("rgbif", quietly = TRUE)) {
   library("rgbif")
   wkt <- 'POLYGON((30.1 10.1, 10 20, 20 40, 40 40, 30.1 10.1))'
   res <- rgbif::occ_data(geometry = wkt, limit=100)$data
} else {
   res <- sample_data_4
}

## By specific country name
NROW(res)
df_within <- dframe(res) %>% coord_within(country = "Egypt")
NROW(df_within)
attr(df_within, "coord_within")

## By a field in your data - makes sure your points occur in one of those countries
NROW(res)
df_within <- dframe(res) %>% coord_within(field = "country")
NROW(df_within)
attr(df_within, "coord_within")

# Remove those very near political centroids
## not ready yet
# NROW(df)
# df_polcent <- dframe(df) %>% coord_pol_centroids()
# NROW(df_polcent)
# attr(df_polcent, "coord_polcent")

## lat/long column names can vary
df <- sample_data_1
head(df)
names(df)[2:3] <- c('mylon', 'mylat')
head(df)
df[1, "mylat"] <- 170
dframe(df) %>% coord_impossible(lat = "mylat", lon = "mylon")

# }

Run the code above in your browser using DataLab