Learn R Programming

⚠️There's a newer version (1.3) of this package.Take me there.

tm.plugin.webmining (version 1.1)

Retrieve structured, textual data from various web sources

Description

tm.plugin.webmining facilitates text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.

Copy Link

Version

Install

install.packages('tm.plugin.webmining')

Monthly Downloads

15

Version

1.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Repository

https://github.com/mannau/tm.plugin.webmining

Maintainer

Mario Annau

Last Published

May 11th, 2014

Functions in tm.plugin.webmining (1.1)

Buildup string for feedquery.

Get main content for corpus items, specified by links.

Get feed data from NYTimes Article Search (http://developer.nytimes.com/docs/read/article_search_api).

WebCorpus retrieved from Yahoo! News for the search term "Microsoft" through the YahooNewsSource. Length of retrieved corpus is 20.

ReutersNewsSource

Get feed data from Reuters News RSS feed channels. Reuters provides numerous feed

trimWhiteSpaces

Trim White Spaces from Text Document.

Read content from WebXMLSource/WebHTMLSource/WebJSONSource.

Wrapper/Convenience function to ensure right encoding for different Platforms

GoogleFinanceSource

Get feed Meta Data from Google Finance.

Copy of RCurl:::getURL() including a little bugfix for the .encoding parameter.

Update WebXMLSource/WebHTMLSource/WebJSONSource

tm.plugin.webmining-package

Retrieve structured, textual data from various web sources

YahooFinanceSource

Get feed data from Yahoo! Finance.

Extract main content from TextDocuments.

GoogleNewsSource

Get feed data from Google News Search http://news.google.com/

Read Web Content and respective Link Content from feedurls.

Enclose Text Content in HTML tags

YahooNewsSource

Get feed data from Yahoo! News (http://news.yahoo.com/).

extractHTMLStrip

Simply strip HTML Tags from Document

Update/Extend WebCorpus with new feed items.

GoogleBlogSearchSource

Get feed data from Google Blog Search (http://www.google.com/blogsearch).

Retrieve Empty Corpus Elements through $postFUN.

WebCorpus constructor function.

YahooInplaySource

Get News from Yahoo Inplay.

Remove non-ASCII characters from Text.

extractContentDOM

Extract Main HTML Content from DOM