Usage
getLinkContent(corpus, links = sapply(corpus, meta, "Origin"),
timeout.request = 30, chunksize = 20, verbose = getOption("verbose"),
curlOpts = curlOptions(verbose = FALSE, followlocation = TRUE, maxconnects =
5, maxredirs = 10, timeout = timeout.request, connecttimeout =
timeout.request, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, useragent =
"R"), retry.empty = 3, sleep.time = 3, extractor = ArticleExtractor,
.encoding = integer(), ...)Arguments
corpus
object of class Corpus
for which link content should be downloaded links
character vector specifyinig links to be
used for download, defaults to sapply(corpus, meta,
"Origin")
timeout.request
timeout (in seconds) to be used
for connections/requests, defaults to 30
curlOpts
curl options to be passed to
getURL chunksize
Size of download chunks to be used for
parallel retrieval, defaults to 20
verbose
Specifies if retrieval info should be
printed, defaults to getOption("verbose")
retry.empty
Specifies number of times empty
content sites should be retried, defaults to 3
sleep.time
Sleep time to be used between chunked
download, defaults to 3 (seconds)
extractor
Extractor to be used for content
extraction, defaults to extractContentDOM
...
additional parameters to getURL .encoding
encoding to be used for
getURL, defaults to integer()
(=autodetect)