sparkwarc

v. 
by Javier Luraschi
0

Monthly downloads

0th

Percentile

0
Contributors:
Type Package
License Apache License 2.0
BugReports https://github.com/javierluraschi/sparkwarc
Encoding UTF-8
LazyData true
RoxygenNote 5.0.1
NeedsCompilation no
Packaged 2017-01-13 00:49:39 UTC; javierluraschi
Repository CRAN
Date/Publication 2017-01-13 06:42:24

Load WARC Files into Apache Spark

Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project .
Full Readme

Functions in sparkwarc

Name Description
cc_warc Provides WARC paths for commoncrawl.org
spark_read_warc Reads a WARC File into Apache Spark
No Results!

Dependencies

Get your badge !

[![Rdoc](http://www.rdocumentation.org/badges/version/sparkwarc)](http://www.rdocumentation.org/packages/sparkwarc)