StopWordPipe
class is responsible for detecting
the existing stop words in the data field of each Instance
.
Identified stop words are stored inside the contraction field of
Instance
class. Moreover if needed, is able to perform inline
stop words removement.
StopWordPipe
StopWordPipe$new(propertyName = "stopWord", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list("AbbreviationPipe"))
Arguments:
propertyName: (character) name of the property associated with the Pipe.
propertyLanguageName: (character) name of the language property.
alwaysBeforeDeps: (list) the dependences alwaysBefore (Pipes that must be executed before this one).
notAfterDeps: (list) the dependences notAfter (Pipes that cannot be executed after this one).
This class inherits from PipeGeneric
and implements the
pipe
abstract function.
pipe:
preprocesses the Instance
to obtain/remove the stop words.
The stop words found in the pipe are added to the list of properties of
the Instance
If the removeStopWords parameter is TRUE,
the Instance
data will be removed.
findStopWord: checks if the stop word is in the data.
Usage:
findStopWord(data, stopWord)
Value: boolean, depending on whether the stop word is on the data.
Arguments:
data: (character) text where stop words will be searched.
stopWord: (character) Indicates the stop word to find.
removeStopWord: removes the stop word in the data.
Usage:
removeStopWord(stopWord, data)
Value: the data with stop word removed.
Arguments:
stopWord: (character) indicates the stop word to remove.
data: (character) text where stop words will be removed.
getPropertyLanguageName: gets of name of property language.
Usage:
getPropertyLanguageName()
Value: value of name of property language.
getPathResourcesStopWords: gets of path of stop words resources.
Usage:
getPathResourcesStopWords()
Value: value of path of stop words resources.
setPathResourcesStopWords: sets the path of stop words resources.
Usage:
setPathResourcesStopWords(path)
Arguments:
path: (character) the new value of the path of stop words resources.
propertyLanguageName: (character) the name of property about language.
pathResourcesStopWords: (character) the path where are the resources.
StopWordPipe
class requires the resource files (in json format)
containing the list of stop words. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should defined
in the resourcesPath section of the configuration file.
[resourcesPath]
resourcesStopWordsPath = <<resources_stopWords_path>>
AbbreviationPipe
, ContractionPipe
,
File2Pipe
, FindEmojiPipe
,
FindEmoticonPipe
, FindHashtagPipe
,
FindUrlPipe
, FindUserNamePipe
,
GuessDatePipe
, GuessLanguagePipe
,
Instance
, InterjectionPipe
,
MeasureLengthPipe
, PipeGeneric
,
ResourceHandler
, SlangPipe
,
StoreFileExtPipe
, TargetAssigningPipe
,
TeeCSVPipe
, ToLowerCasePipe