ECIMCI_profiles: ECI/MCI N-Gram Profiles

Description

N-gram profile db for 26 languages based on the European Corpus Initiative Multilingual Corpus I.

Usage

ECIMCI_profiles

Arguments

Details

This profile db was built by Johannes Rauch using the ECI/MCI corpus using the default options employed by package textcat, with all text documents encoded in UTF-8.

The category ids used for the db are the respective IETF language tags (see language in package tau), using the ISO 639-2 Part B language subtags and, for Serbian, the script employed (i.e., "scc-Cyrl" and "scc-Latn" for Serbian written in Cyrillic and Latin script, respectively; all other languages in the profile are always written in Latin script.)