stream_tweets: Collect a live stream of Twitter data.

Description

Returns public statuses via one of the following four methods:

1. Sampling a small random sample of all publicly available tweets
2. Filtering via a search-like query (up to 400 keywords)
3. Tracking via vector of user ids (up to 5000 user_ids)
4. Location via geo coordinates (1-360 degree location boxes)

Stream with hardwired reconnection method to ensure timeout integrity.

Usage

stream_tweets(q = "", timeout = 30, parse = TRUE, token = NULL,
  file_name = NULL, verbose = TRUE, ...)
stream_tweets2(..., dir = NULL, append = FALSE)

Arguments

Query used to select and customize streaming collection method. There are four possible methods. (1) The default, q = "", returns a small random sample of all publicly available Twitter statuses. (2) To filter by keyword, provide a comma separated character string with the desired phrase(s) and keyword(s). (3) Track users by providing a comma separated list of user IDs or screen names. (4) Use four latitude/longitude bounding box points to stream by geo location. This must be provided via a vector of length 4, e.g., c(-125, 26, -65, 49).

timeout

Numeric scalar specifying amount of time, in seconds, to leave connection open while streaming/capturing tweets. By default, this is set to 30 seconds. To stream indefinitely, use timeout = FALSE to ensure JSON file is not deleted upon completion or timeout = Inf.

parse

Logical, indicating whether to return parsed data. By default, parse = TRUE, this function does the parsing for you. However, for larger streams, or for automated scripts designed to continuously collect data, this should be set to false as the parsing process can eat up processing resources and time. For other uses, setting parse to TRUE saves you from having to sort and parse the messy list structure returned by Twitter. (Note: if you set parse to false, you can use the parse_stream function to parse the JSON file at a later point in time.)

token

Every user should have their own Oauth (Twitter API) token. By default token = NULL this function looks for the path to a saved Twitter token via environment variables (which is what `create_token()` sets up by default during initial token creation). For instruction on how to create a Twitter token see the tokens vignette, i.e., `vignettes("auth", "rtweet")` or see ?tokens.

file_name

Character with name of file. By default, a temporary file is created, tweets are parsed and returned to parent environment, and the temporary file is deleted.

verbose

Logical, indicating whether or not to include output processing/retrieval messages.

…

Insert magical parameters, spell, or potion here. Or filter for tweets by language, e.g., language = "en".

dir

Name of directory in which json files should be written. The default, NULL, will create a timestamped "stream" folder in the current working directory. If a dir name is provided that does not already exist, one will be created.

append

Logical indicating whether to append or overwrite file_name if the file already exists. Defaults to FALSE, meaning this function will overwrite the preexisting file_name (in other words, it will delete any old file with the same name as file_name) meaning the data will be added as new lines to file if pre-existing.

Value

Tweets data returned as data frame with users data as attribute.

Returns data as expected using original search_tweets function.

Examples

Run this code

# NOT RUN {
## stream tweets mentioning "election" for 90 seconds
e <- stream_tweets("election", timeout = 90)

## data frame where each observation (row) is a different tweet
e

## plot tweet frequency
ts_plot(e, "secs")

## stream tweets mentioning Obama for 30 seconds
djt <- stream_tweets("realdonaldtrump", timeout = 30)

## preview tweets data
djt

## get user IDs of people who mentioned trump
usrs <- users_data(djt)

## lookup users data
usrdat <- lookup_users(unique(usrs$user_id))

## preview users data
usrdat

## store large amount of tweets in files using continuous streams
## by default, stream_tweets() returns a random sample of all tweets
## leave the query field blank for the random sample of all tweets.
stream_tweets(
  timeout = (60 * 10),
  parse = FALSE,
  file_name = "tweets1"
)
stream_tweets(
  timeout = (60 * 10),
  parse = FALSE,
  file_name = "tweets2"
)

## parse tweets at a later time using parse_stream function
tw1 <- parse_stream("tweets1.json")
tw1

tw2 <- parse_stream("tweets2.json")
tw2

## streaming tweets by specifying lat/long coordinates

## stream continental US tweets for 5 minutes
usa <- stream_tweets(
  c(-125, 26, -65, 49),
  timeout = 300
)

## use lookup_coords() for a shortcut verson of the above code
usa <- stream_tweets(
  lookup_coords("usa"),
  timeout = 300
)

## stream world tweets for 5 mins, save to JSON file
## shortcut coords note: lookup_coords("world")
world.old <- stream_tweets(
  c(-180, -90, 180, 90),
  timeout = (60 * 5),
  parse = FALSE,
  file_name = "world-tweets.json"
)

## read in JSON file
rtworld <- parse_stream("word-tweets.json")

## world data set with with lat lng coords variables
x <- lat_lng(rtworld)

# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples