cld2

detect_language

detect_language_mixed

detect_language_multi

a string with text to classify or a connection to read from

text

if <code>FALSE</code> then code skips HTML tags and expands HTML entities

plain_text

return a language code instead of name

lang_code

The function <code><a rd-options="=detect_language" href="/link/detect_language()?package=cld2&version=1.2&to=%3Ddetect_language" data-mini-rdoc="=detect_language::detect_language()">detect_language()</a></code> is vectorised and guesses the the language of each string
in <code>text</code> or returns <code>NA</code> if the language could not reliably be determined. The function
<code><a rd-options="=detect_language_multi" href="/link/detect_language_multi()?package=cld2&version=1.2&to=%3Ddetect_language_multi" data-mini-rdoc="=detect_language_multi::detect_language_multi()">detect_language_multi()</a></code> is not vectorised and analyses the entire character vector as a
whole. The output includes the top 3 detected languages including the relative proportion
and the total number of text bytes that was reliably classified.

Bindings to Google's C++ library Compact Language Detector 2
(see <https://github.com/cld2owners/cld2#readme> for more information). Probabilistically
detects over 80 languages in plain text or HTML. For mixed-language input it returns the
top three detected languages and their approximate proportion of the total classified
text bytes (e.g. 80% English and 20% French out of 1000 bytes). There is also a 'cld3'
package on CRAN which uses a neural network model instead.

Jeroen Ooms

Google's Compact Language Detector 2

Dirk Sites 

cld2 function

The function <code><a rd-options='=detect_language' href='detect_language()'>detect_language()</a></code> is vectorised and guesses the the language of each string
in <code>text</code> or returns <code>NA</code> if the language could not reliably be determined. The function
<code><a rd-options='=detect_language_multi' href='detect_language_multi()'>detect_language_multi()</a></code> is not vectorised and analyses the entire character vector as a
whole. The output includes the top 3 detected languages including the relative proportion
and the total number of text bytes that was reliably classified.

cld2: Compact Language Detector 2

Description

Usage

Arguments

Examples