corpus (version 0.10.1)

abbreviations: Abbreviations

Description

Lists of common abbreviations.

Usage

abbreviations_de
abbreviations_en
abbreviations_es
abbreviations_fr
abbreviations_it
abbreviations_pt
abbreviations_ru

Arguments

Format

A character vector of unique abbreviations.

Details

The abbreviations_ objects are character vectors of abbreviations. These are words or phrases containing full stops (periods, ambiguous sentence terminators) that require special handling for sentence detection and tokenization.

The original lists were compiled by the Unicode Common Locale Data Repository. We have tailored the English list by adding single-letter abbreviations and making a few other additions.

The built-in abbreviation lists are reasonable defaults, but they may require further tailoring to suit your particular task.

See Also

text_filter.