This class allows guess the language by using language detector of library cld2. Creates the language property which indicates the idiom text. Optionally, it is possible to choose the language provided by Twitter.
GuessLanguagePipe
GuessLanguagePipe$new(propertyName = "language",
alwaysBeforeDeps = list("StoreFileExtPipe",
"TargetAssigningPipe"),
notAfterDeps = list(),
languageTwitter = TRUE)
Arguments:
propertyName: (character) name of the property associated with the Pipe.
alwaysBeforeDeps: (list) the dependences alwaysBefore (Pipes that must be executed before this one).
notAfterDeps: (list) the dependences notAfter (Pipes that cannot be executed after this one).
languageTwitter: (logical) indicates whether for the Instances of type twtid the language that returns the api is obtained or the detector is applied.
This class inherits from GenericPipe and implements the
pipe abstract function.
pipe:
preprocesses the Instance to obtain the language of the data.
getLanguage: guesses the language of data.
Usage:
getLanguage(data)
Value: the language guesser. Format: see ISO 639-3:2007.
Arguments:
data: (character) text to guess the language.
languageTwitter: (logical) indicates whether for the Instances of type twtid the language that returns the api is obtained or the detector is applied.
To obtain the language of the tweets, it will be verified that there is a json file with the information stored in memory. On the other hand, it is necessary define the "cache.twitter.path" field of bdpar.Options variable to know where the information of tweets are saved.
AbbreviationPipe, bdpar.Options,
ContractionPipe, File2Pipe,
FindEmojiPipe, FindEmoticonPipe,
FindHashtagPipe, FindUrlPipe,
FindUserNamePipe, GuessDatePipe,
Instance, InterjectionPipe,
MeasureLengthPipe, GenericPipe,
SlangPipe, StopWordPipe,
StoreFileExtPipe, TargetAssigningPipe,
TeeCSVPipe, ToLowerCasePipe