stream_tweets: stream_tweets

Description

Returns public statuses via one of three methods described below. By design, this function deciphers which method to use when processing the stream argument.

1. Filtering via a search-like query (up to 400 keywords)
2. Tracking via vector of user ids (up to 5000 user_ids)
3. Location via geo coordinates (1-360 degree location boxes)

Usage

stream_tweets(q = "", timeout = 30, parse = TRUE, clean_tweets = TRUE,
  as_double = FALSE, token = NULL, file_name = NULL, gzip = FALSE,
  verbose = TRUE, ...)

Arguments

Character vector with desired phrases and keywords used to filter tweets, a comma separated list of desired user IDs to track, or a set of bounding boxes to track. If left empty, the default q = "", stream function will return sample of all tweets.

timeout

Numeric scalar specifying amount of time, in seconds, to leave connection open while streaming/capturing tweets. By default, this is set at 30 seconds.

parse

Logical, indicating whether to return parsed data. By default, parse = TRUE, this function does the parsing for you. However, for larger streams, or for automated scripts designed to continuously collect data, this should be set to false as the parsing process can eat up processing resources and time. For other uses, setting parse to TRUE saves you from having to sort and parse the messy list structure returned by Twitter. (Note: if you set parse to false, you can use the parse_stream function to parse the json file at a later point in time.)

clean_tweets

logical indicating whether to remove non-ASCII characters in text of tweets. defaults to TRUE.

as_double

logical indicating whether to handle ID variables as double (numeric) class. By default, this is set to FALSE, meaning ID variables are treated as character vectors. Setting this to TRUE can provide performance (speed and memory) boost but can also lead to issues when printing and saving, depending on the format.

token

OAuth token. By default token = NULL fetches a non-exhausted token from an environment variable. Find instructions on how to create tokens and setup an environment variable in the tokens vignette (in r, send ?tokens to console).

file_name

Character with name of file. By default, a temporary file is created, tweets are parsed and returned to parent environment, and the temporary file is deleted.

gzip

Logical indicating whether to request gzip compressed stream data. By default this is set to FALSE. After performing some tests, it appears gzip requires less bandwidth, but also returns slightly fewer tweets. Use of gzip option should, in theory, make connection more reliable (by hogging less bandwidth, there's less of a chance Twitter cuts you off for getting behind).

verbose

Logical, indicating whether or not to include output processing/retrieval messages.

…

Insert magical paramaters, spell, or potion here.

Value

Tweets data returned as data frame with users data as attribute.

Examples

Run this code

# NOT RUN {
# stream tweets mentioning "election" for 90 seconds
e <- stream_tweets("election", timeout = 90)

# data frame where each observation (row) is a different tweet
e

# users data also retrieved. can access it via users_data()
users_data(e)

# stream tweets mentioning Obama for 30 seconds
djt <- stream_tweets("realdonaldtrump", timeout = 30)
djt # prints tweets data preview
users_data(djt) # prints users data preview

# store large amount of tweets in files using continuous streams
# by default, stream_tweets() returns a random sample of all tweets
# leave the query field blank for the random sample of all tweets.
stream_tweets(timeout = (60 * 10), parse = FALSE, file_name = "tweets1")
stream_tweets(timeout = (60 * 10), parse = FALSE, file_name = "tweets2")

# parse tweets at a later time using parse_stream function
tw1 <- parse_stream("tweets1.json")
tw1

tw2 <- parse_stream("tweets2.json")
tw2
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples