Collect.twitter: Collect tweet data from twitter search

Description

This function collects tweet data based on search terms and structures the data into a dataframe with the class names "datasource" and "twitter".

The twitter Standard search API sets a rate limit of 180 requests every 15 minutes. A maximum of 100 tweets can be collected per search request meaning the maximum number of tweets per operation is 18000 / 15 minutes. More tweets can be collected by using retryOnRateLimit = TRUE parameter which will cause the collection to pause if the rate limit is reached and resume when the rate limit resets (in approximately 15 minutes). Alternatively the twitter API parameter since_id can be used in a later session to resume a twitter search collection from the last tweet previously collected as tweet status id's are sequential. The Standard API only returns tweets for the last 7 days.

All of the search query operators available through the twitter API can be used in the searchTerm field. For example, to search for tweets containing the term "love" or "hate" the "OR" operator can be used in the term field: searchTerm = "love OR hate". For more information refer to the twitter API documentation for query operators: https://developer.twitter.com/en/docs/tweets/search/guides/standard-operators.

Usage

# S3 method for twitter
Collect(
  credential,
  searchTerm = "",
  searchType = "recent",
  numTweets = 100,
  includeRetweets = TRUE,
  retryOnRateLimit = FALSE,
  writeToFile = FALSE,
  verbose = FALSE,
  ...
)

Value

A data.frame object with class names "datasource" and "twitter".

Arguments

credential

A credential object generated from Authenticate with class name "twitter".

searchTerm

Character string. Specifies a twitter search term. For example, "Australian politics" or the hashtag "#auspol".

searchType

Character string. Returns filtered tweets as per search type recent, mixed or popular. Default type is recent.

numTweets

Numeric. Specifies how many tweets to be collected. Defaults is 100.

includeRetweets

Logical. Specifies if the search should filter out retweets. Defaults is TRUE.

retryOnRateLimit

Logical. Default is FALSE.

writeToFile

Logical. Write collected data to file. Default is FALSE.

verbose

Logical. Output additional information about the data collection. Default is FALSE.

...

Arguments passed on to rtweet::search_tweets

geocode: Geographical limiter of the template "latitude,longitude,radius" e.g., geocode = "37.78,-122.40,1mi".

max_id

Character, returns results with an ID less than (that is, older than) or equal to `max_id`. Especially useful for large data returns that require multiple iterations interrupted by user time constraints. For searches exceeding 18,000 tweets, users are encouraged to take advantage of rtweet's internal automation procedures for waiting on rate limits by setting retryonratelimit argument to TRUE. It some cases, it is possible that due to processing time and rate limits, retrieving several million tweets can take several hours or even multiple days. In these cases, it would likely be useful to leverage retryonratelimit for sets of tweets and max_id to allow results to continue where previous efforts left off.

parse

Logical, indicating whether to return parsed data.frame, if true, or nested list, if false. By default, parse = TRUE saves users from the wreck of time and frustration associated with disentangling the nasty nested list returned from Twitter's API. As Twitter's APIs are subject to change, this argument would be especially useful when changes to Twitter's APIs affect performance of internal parsers. Setting parse = FALSE also ensures the maximum amount of possible information is returned. By default, the rtweet parse process returns nearly all bits of information returned from Twitter. However, users may occasionally encounter new or omitted variables. In these rare cases, the nested list object will be the only way to access these variables.

Examples

Run this code

if (FALSE) {
# search and collect 100 recent tweets for the hashtag #auspol
myTwitterData <- twitterAuth %>% 
  Collect(searchTerm = "#auspol", searchType = "recent", numTweets = 100, verbose = TRUE, 
          includeRetweets = FALSE, retryOnRateLimit = TRUE, writeToFile = TRUE)
}

Run the code above in your browser using DataLab