StopWordPipe class is responsible for detecting
the existing stop words in the data field of each Instance.
Identified stop words are stored inside the contraction field of
Instance class. Moreover if needed, is able to perform inline
stop words removement.
This class inherits from GenericPipe and implements the
pipe abstract function.
bdpar::GenericPipe -> StopWordPipe
new()Creates a StopWordPipe object.
StopWordPipe$new(
propertyName = "stopWord",
propertyLanguageName = "language",
alwaysBeforeDeps = list("GuessLanguagePipe"),
notAfterDeps = list("AbbreviationPipe"),
removeStopWords = TRUE,
resourcesStopWordsPath = NULL
)propertyNameA character value. Name of the property
associated with the GenericPipe.
propertyLanguageNameA character value. Name of the
language property.
alwaysBeforeDepsA list value. The dependencies
alwaysBefore (GenericPipes that must be executed before
this one).
notAfterDepsA list value. The dependencies
notAfter (GenericPipes that cannot be executed after
this one).
removeStopWordsA logical value. Indicates if
the stop words are removed or not.
resourcesStopWordsPathA character value. Path
of resource files (in json format) containing the stop words.
pipe()Preprocesses the Instance to obtain/remove
the stop words. The stop words found in the data are added to the
list of properties of the Instance.
StopWordPipe$pipe(instance)instanceA Instance value. The Instance
to preprocess.
The Instance with the modifications that have
occurred in the pipe.
findStopWord()Checks if the stop word is in the data.
StopWordPipe$findStopWord(data, stopWord)dataA character value. The text where stop word
will be searched.
stopWordA character value. Indicates the
stop word to find.
A logical value depending on whether the
stop word is in the data.
removeStopWord()Removes the stop word in the data.
StopWordPipe$removeStopWord(stopWord, data)stopWordA character value. Indicates the
stop word to remove.
dataA character value. The text where stop word
will be removed.
The data with the stop words removed.
getPropertyLanguageName()Gets the name of property language.
StopWordPipe$getPropertyLanguageName()Value of name of property language.
getResourcesStopWordsPath()Gets the path of stop words resources.
StopWordPipe$getResourcesStopWordsPath()Value of path of stop words resources.
setResourcesStopWordsPath()Sets the path of stop words resources.
StopWordPipe$setResourcesStopWordsPath(path)pathA character value. The new value of the path of
stop words resources.
clone()The objects of this class are cloneable with this method.
StopWordPipe$clone(deep = FALSE)deepWhether to make a deep clone.
StopWordPipe class requires the resource files (in json format)
containing the list of stop words. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should be
defined in the "resources.stopwords.path" field of
bdpar.Options variable.
AbbreviationPipe, bdpar.Options,
ContractionPipe, File2Pipe,
FindEmojiPipe, FindEmoticonPipe,
FindHashtagPipe, FindUrlPipe,
FindUserNamePipe, GuessDatePipe,
GuessLanguagePipe, Instance,
InterjectionPipe, MeasureLengthPipe,
GenericPipe, ResourceHandler,
SlangPipe, StoreFileExtPipe,
TargetAssigningPipe, TeeCSVPipe,
ToLowerCasePipe