Learn R Programming

⚠️There's a newer version (3.1.0) of this package.Take me there.

bdpar (version 1.0.1)

Big Data Preprocessing Architecture

Description

Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, 'bdpar' allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, tweets, YouTube comments).

Copy Link

Version

Install

install.packages('bdpar')

Monthly Downloads

280

Version

1.0.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Miguel Ferreiro-D<c3><ad>az

Last Published

January 9th, 2020

Functions in bdpar (1.0.1)

FindEmojiPipe

Class to find and/or replace the emoji on the data field of an Instance
ContractionPipe

Class to find and/or replace the contractions on the data field of a Instance
File2Pipe

Class to obtain the source field of an Instance
Connections

Class to manage the connections with Twitter and YouTube
ExtractorTwtid

Class to handle tweets files with twtid extension
ExtractorEml

Class to handle email files with eml extension
Bdpar

Class to manage the preprocess of the files throughout the flow of pipes
Instance

Abstract super class that handles the management of the Instances
FindUserNamePipe

Class to find and/or remove the users on the data field of an Instance
FindUrlPipe

Class to find and/or remove the URLs on the data field of an Instance
InterjectionPipe

Class to find and/or remove the interjections on the data field of an Instance
InstanceFactory

Class to handle the creation of Instance types
FindHashtagPipe

Class to find and/or remove the hashtags on the data field of an Instance
FindEmoticonPipe

Class to find and/or remove the emoticons on the data field of an Instance
PipeGeneric

Abstract super classs that handles the management of the Pipes
StopWordPipe

Class to find and/or remove the stop words on the data field of an Instance
ResourceHandler

Class that handles different types of resources
StoreFileExtPipe

Class to get the file's extension field of an Instance
TypePipe

Absctract super class implementing the pipelining proccess.
ToLowerCasePipe

Class to convert the data field of an Instance to lower case
GuessDatePipe

Class to obtain the date field of an Instance
TargetAssigningPipe

Class to get the target field of the Instance
GuessLanguagePipe

Class to guess the language of an Instance
MeasureLengthPipe

Class to obtain the length of the data field of an Instance
TeeCSVPipe

Class to handle a CSV with the properties field of the preprocessed Instance
SerialPipe

Class implementing a default pipelining proccess.
SlangPipe

Class to find and/or replace the slangs on the data field of an Instance
pipeline_execute

Initiates the pipelining process.
%>I%

bdpar customized forward-pipe operator
ExtractorSms

Class to handle SMS files with tsms extension
AbbreviationPipe

Class to find and/or replace the abbreviations on the data field of an Instance
ExtractorYtbid

Class to handle comments of YouTube files with ytbid extension