This class allows guess the language by using language detector of library cld2. Creates the language property which indicates the idiom text. Optionally, it is possible to choose the language provided by Twitter.
GuessLanguagePipe
GuessLanguagePipe$new(propertyName = "language", alwaysBeforeDeps = list("StoreFileExtPipe", "TargetAssigningPipe"), notAfterDeps = list())
Arguments:
propertyName: (character) name of the property associated with the Pipe.
alwaysBeforeDeps: (list) the dependences alwaysBefore (Pipes that must be executed before this one).
notAfterDeps: (list) the dependences notAfter (Pipes that cannot be executed after this one).
This class inherits from PipeGeneric
and implements the
pipe
abstract function.
pipe:
preprocesses the Instance
to obtain the language of the data.
Usage:
pipe(instance, languageTwitter = TRUE)
Value:
the Instance
with the modifications that have occurred in the Pipe.
Arguments:
instance:
(Instance) Instance
to preproccess.
languageTwitter: (logical) indicates whether for the Instances of type twtid the language that returns the api is obtained or the detector is applied.
getLanguage: guesses the language of data.
Usage:
getLanguage(data)
Value: the language guesser. Format: see ISO 639-3:2007.
Arguments:
data: (character) text to guess the language.
To obtain the language of the tweets, it will be verified that there is a json file with the information stored in memory.
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
PipeGeneric
, SlangPipe
,
StopWordPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe