This function collects tweet data based on search terms and structures the data into a dataframe with
the class names "datasource"
and "twitter"
.
The twitter Standard search API sets a rate limit of 180 requests every 15 minutes. A maximum of 100 tweets can be
collected per search request meaning the maximum number of tweets per operation is 18000 / 15 minutes. More tweets
can be collected by using retryOnRateLimit = TRUE
parameter which will cause the collection to pause if the
rate limit is reached and resume when the rate limit resets (in approximately 15 minutes). Alternatively the twitter
API parameter since_id
can be used in a later session to resume a twitter search collection from the last
tweet previously collected as tweet status id's are sequential. The Standard API only returns tweets for the last
7 days.
All of the search query operators available through the twitter API can be used in the searchTerm
field.
For example, to search for tweets containing the term "love"
or "hate"
the "OR"
operator can be
used in the term field: searchTerm = "love OR hate"
. For more information refer to the twitter API
documentation for query operators: https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators.
# S3 method for twitter
Collect(
credential,
searchTerm = "",
searchType = "recent",
numTweets = 100,
includeRetweets = TRUE,
retryOnRateLimit = FALSE,
writeToFile = FALSE,
verbose = FALSE,
...
)
A data.frame object with class names "datasource"
and "twitter"
.
A credential
object generated from Authenticate
with class name "twitter"
.
Character string. Specifies a twitter search term. For example, "Australian politics"
or
the hashtag "#auspol"
.
Character string. Returns filtered tweets as per search type recent
, mixed
or
popular
. Default type is recent
.
Numeric. Specifies how many tweets to be collected. Defaults is 100
.
Logical. Specifies if the search should filter out retweets. Defaults is TRUE
.
Logical. Default is FALSE
.
Logical. Write collected data to file. Default is FALSE
.
Logical. Output additional information about the data collection. Default is FALSE
.
Arguments passed on to rtweet::search_tweets
geocode
Geographical limiter of the template
"latitude,longitude,radius" e.g., geocode =
"37.78,-122.40,1mi"
.
max_id
Character, returns results with an ID less
than (that is, older than) or equal to `max_id`. Especially
useful for large data returns that require multiple iterations
interrupted by user time constraints. For searches exceeding
18,000 tweets, users are encouraged to take advantage of rtweet's
internal automation procedures for waiting on rate limits by
setting retryonratelimit
argument to TRUE. It some cases,
it is possible that due to processing time and rate limits,
retrieving several million tweets can take several hours or even
multiple days. In these cases, it would likely be useful to
leverage retryonratelimit
for sets of tweets and
max_id
to allow results to continue where previous efforts
left off.
parse
Logical, indicating whether to return parsed
data.frame, if true, or nested list, if false. By default,
parse = TRUE
saves users from the wreck of time and
frustration associated with disentangling the nasty nested list
returned from Twitter's API. As Twitter's APIs are subject to
change, this argument would be especially useful when changes to
Twitter's APIs affect performance of internal parsers. Setting
parse = FALSE
also ensures the maximum amount of possible
information is returned. By default, the rtweet parse process
returns nearly all bits of information returned from
Twitter. However, users may occasionally encounter new or
omitted variables. In these rare cases, the nested list object
will be the only way to access these variables.
if (FALSE) {
# search and collect 100 recent tweets for the hashtag #auspol
myTwitterData <- twitterAuth %>%
Collect(searchTerm = "#auspol", searchType = "recent", numTweets = 100, verbose = TRUE,
includeRetweets = FALSE, retryOnRateLimit = TRUE, writeToFile = TRUE)
}
Run the code above in your browser using DataLab