Learn R Programming

bdpar (version 1.0.1)

FindUrlPipe: Class to find and/or remove the URLs on the data field of an Instance

Description

This class is responsible of detecting the existing URLs in the data field of each Instance. Identified URLs are stored inside the URLs field of Instance class. Moreover if required, is able to perform inline URLs removement.

Usage

FindUrlPipe

Arguments

Constructor

FindUrlPipe$new(propertyName = "URLs",
                alwaysBeforeDeps = list(),
                notAfterDeps = list())

  • Arguments:

    • propertyName: (character) name of the property associated with the Pipe.

    • alwaysBeforeDeps: (list) the dependences alwaysBefore (Pipes that must be executed before this one).

    • notAfterDeps: (list) the dependences notAfter (Pipes that cannot be executed after this one).

Inherit

This class inherits from PipeGeneric and implements the pipe abstract function.

Methods

  • pipe: preprocesses the Instance to obtain/remove the users.

    • Usage:

      pipe(instance,
           removeUrl = TRUE,
           URLPatterns = list(self$URLPattern, self$EmailPattern),
           namesURLPatterns = list("UrlPattern","EmailPattern"))

    • Value:

      the Instance with the modifications that have occurred in the Pipe.

    • Arguments:

      • instance: (Instance) Instance to preproccess.

      • removeUrl: (logical) indicates if the URLs are removed.

      • URLPatterns: (list) the regex to find URLs.

      • namesURLPatterns: (list) the names of regex.

  • findUrl: finds the URLs in the data.

    • Usage: findHashtag(pattern, data)

    • Value: list with URLs found.

    • Arguments:

      • pattern: (character) regex to find URLs.

      • data: (character) text to search the URLs.

  • removeUrl: removes the URLs in the data.

    • Usage: removeUrl(pattern, data)

    • Value: the data with URLs removed.

    • Arguments:

      • pattern: (character) regex to find URLs.

      • data: (character) text to remove the URLs.

  • putNamesURLPattern: sets the names to URL patterns result.

    • Usage: putNamesURLPattern(resultOfURLPatterns)

    • Value: Value of resultOfURLPatterns variable with the names of URL pattern.

    • Arguments:

      • resultOfURLPatterns: (list) list with URLs found.

  • getURLPatterns: gets of URL patterns.

    • Usage: getURLPatterns()

    • Value: value of URL patterns.

  • getNamesURLPatterns: gets of name of URLs.

    • Usage: getNamesURLPatterns()

    • Value: value of name of URLs.

  • setNamesURLPatterns: sets the name of URLs.

    • Usage: setNamesURLPatterns(namesURLPatterns)

    • Arguments:

      • namesURLPatterns: (character) the new value of the name of URLs.

Public fields

  • URLPattern: (character) regular expression to detect URLs.

  • EmailPattern: (character) regular expression to detect emails.

Private fields

  • URLPatterns: (list) regular expressions used to detect URLs.

  • namesURLPatterns: (list) names of regular expressions that are used to identify URLs.

Details

The regular expressions indicated in the URLPatterns variable are used to identify URLs.

See Also

AbbreviationPipe, ContractionPipe, File2Pipe, FindEmojiPipe, FindEmoticonPipe, FindHashtagPipe, FindUserNamePipe, GuessDatePipe, GuessLanguagePipe, Instance, InterjectionPipe, MeasureLengthPipe, PipeGeneric, SlangPipe, StopWordPipe, StoreFileExtPipe, TargetAssigningPipe, TeeCSVPipe, ToLowerCasePipe