country_dictionary: Country dictionary for standardizing country names and codes

Description

country_dictionary provides a set of lookup tables used to standardize country names and country codes in occurrence datasets.

The dictionary is built from rnaturalearthdata::map_units110 and consolidates a wide variety of country name variants (in several languages and formats), as well as multiple coding systems, into a single suggested standardized name.

This object is used internally by functions that clean or harmonize country fields, ensuring that country names in occurrence datasets (e.g., "Brasil","brasil", "BR", "BRA", "République Française") are all mapped consistently to a single standardized form ("brazil", "france", etc.).

Usage

country_dictionary

Arguments

Format

A named list of two data frames:

country_name

A data frame with two columns:

country_name: Character. Lowercased and accent-stripped country name variants (from multiple rnaturalearthdata fields such as name, name_long, abbrev, formal_en, and alternative names in several languages).

country_suggested

Character. The standardized country name, derived from the name column of map_units110, also lowercased and accent-stripped.

country_code

A data frame with two columns:

country_code: Character. Country codes from several systems, including ISO-2, ISO-3, FIPS, postal codes, and others, after filtering invalid or ambiguous codes.

country_suggested

Character. The standardized country name corresponding to each code.

Details

The dictionary is generated by:

extracting multiple name and code fields from rnaturalearthdata::map_units110,
converting names to lowercase and removing accents,
converting codes to uppercase,
removing invalid or ambiguous codes (e.g., -99, "J", various country mismatches),
and ensuring uniqueness across all entries.

Examples

Run this code

data(country_dictionary)

head(country_dictionary$country_name)
head(country_dictionary$country_code)

Run the code above in your browser using DataLab