Provides lookup tables used to standardize subnational administrative units (states and provinces) in occurrence datasets.
Generated from rnaturalearth::ne_states(), it includes a wide range of
name variants (in multiple languages, transliterations, and common
abbreviations), as well as postal codes for each unit.
This dictionary allows consistent mapping of user-provided names such as
"são paulo", "sao paulo", "SP", "illinois", "ill.", "bayern",
"bavaria" to a single standardized state or province name.
states_dictionaryA named list with two data frames:
A data frame with columns:
Character. Name variants of states or provinces
from ne_states(), lowercased and accent-stripped.
Character. Standardized state/province name, also lowercased and accent-stripped.
Character. Country associated with the state/province, lowercased and accent-stripped.
A data frame with columns:
Character. Postal codes from ne_states(), cleaned
and converted to uppercase.
Character. Standardized state/province name corresponding to the code.
Character. Country associated with the code.
The dictionary is constructed by:
selecting administrative units of type "State" or "Province";
extracting multiple name fields, including alternative names and multilingual fields;
normalizing names to lowercase and removing accents;
normalizing codes to uppercase;
removing duplicates and ambiguous entries;
removing rows with missing names or codes.
data(states_dictionary)
head(states_dictionary$states_name)
head(states_dictionary$states_code)
Run the code above in your browser using DataLab