$N$-gram profile db for 26 languages based on the European Corpus
Initiative Multilingual Corpus I.
Usage
ECIMCI_profiles
Arguments
Details
This profile db was built by Johannes Rauch, using the ECI/MCI corpus
(http://www.elsnet.org/eci.html) and the default options
employed by package textcat, with all text documents encoded in
UTF-8.
The category ids used for the db are the respective IETF language tags
(see language in package NLP), using the ISO 639-2
Part B language subtags and, for Serbian, the script employed (i.e.,
"scc-Cyrl" and "scc-Latn" for Serbian written in
Cyrillic and Latin script, respectively; all other languages in the
profile db are written in Latin script.)
References
S. Armstrong-Warwick, H. S. Thompson, D. McKelvie and D. Petitpierre
(1994),
Data in Your Language: The ECI Multilingual Corpus 1.
In ``Proceedings of the International Workshop on Sharable Natural
Language Resources'' (Nara, Japan), 97--106.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.950
## Languages in the the ECI/MCI profile db:names(ECIMCI_profiles)
## Key options used for the profile:attr(ECIMCI_profiles, "options")[c("n", "size", "reduce", "useBytes")]