search_tweets: search_tweets

Description

Returns two data frames (tweets data and users data) using a provided search query.

Usage

search_tweets(q, n = 100, type = "mixed", max_id = NULL,
  include_rts = TRUE, parse = TRUE, clean_tweets = FALSE,
  as_double = FALSE, token = NULL, verbose = TRUE, ...)

Arguments

Character, search query of no greater than 500 characters maximum.

Numeric, specifying the total number of desired tweets to return. Defaults to 100. Maximum number of tweets returned from a single token is 18,000. See details for more information.

type

Character, specifies what type of search results you would prefer to receive. The current default is type = "mixed", which is a mix between the other two valid values type = "recent" and type = "popular".

max_id

Character, specifying the [oldest] status id beyond which results should resume returning.

include_rts

Logical, indicating whether to include retweets in search results.

parse

Logical, indicating whether to return parsed (data.frames) or nested list (fromJSON) object. By default, parse = TRUE saves users from the time [and frustrations] associated with disentangling the Twitter API return objects.

clean_tweets

logical indicating whether to remove non-ASCII characters in text of tweets. defaults to FALSE.

as_double

logical indicating whether to handle ID variables as double (numeric) class. By default, this is set to FALSE, meaning ID variables are treated as character vectors. Setting this to TRUE can provide performance (speed and memory) boost but can also lead to issues when printing and saving, depending on the format.

token

OAuth token. By default token = NULL fetches a non-exhausted token from an environment variable. Find instructions on how to create tokens and setup an environment variable in the tokens vignette (in r, send ?tokens to console).

verbose

Logical, indicating whether or not to output processing/retrieval messages.

…

Futher arguments passed on to make_url. All named arguments that do not match the above arguments (i.e., count, type, etc.) will be built into the request. To return only English language tweets, for example, use lang = "en". For more options see Twitter's API documentation.

Value

List object with tweets and users each returned as a data frame.

Details

Twitter API document recommends limiting searches to 10 keywords and operators. Complex queries may also produce API errors preventing recovery of information related to the query. It should also be noted Twitter's search API does not consist of an index of all Tweets. At the time of searching, the search API index includes between only 6-9 days of Tweets.

Number of tweets returned will often be less than what was specified by the user. This can happen because (a) the search query did not return many results (the search pool is already thinned out from the population of tweets to begin with) or (b) because you hit your rate limit for a given token. Even if the query has lots of hits and the rate limit should be able to max out at 18,000, the returned number of tweets may be lower, but that's only because the functions filter out duplicates (e.g., 18,000 tweets were actually returned, but 30 of them were removed because they were repeats).

Examples

Run this code

# NOT RUN {
# search for 1000 tweets mentioning Hillary Clinton
hrc <- search_tweets(q = "hillaryclinton", n = 1000)

# data frame where each observation (row) is a different tweet
hrc

# users data also retrieved. can access it via users_data()
users_data(hrc)

# search for 1000 tweets in English
djt <- search_tweets(q = "realdonaldtrump", n = 1000, lang = "en")
djt # prints tweets data preview
users_data(djt) # prints users data preview
# }

Run the code above in your browser using DataLab