parseTweets: Converts tweets in JSON format to data frame.

Description

This function parses tweets downloaded using filterStream, sampleStream or userStream and returns a data frame. If tweet contains 280-character text it will return the complete text and not only 140 characters.

Usage

parseTweets(tweets, simplify = FALSE, verbose = TRUE, legacy = FALSE)

Arguments

tweets

A character string naming the file where tweets are stored or the name of the object in memory where the tweets were saved as strings.

simplify

If TRUE it will return a data frame with only tweet and user fields (i.e., no geographic information or url entities).

verbose

logical, default is TRUE, which will print in the console the number of tweets that have been parsed.

legacy

logical, default is FALSE. Read tweets using old method (reading lines into memory and parsing line by line). Try using legacy=TRUE if getting errors with default options. Note that legacy mode will only return up to 140 characters per tweet.

Details

parseTweets parses tweets downloaded using the filterStream, sampleStream or userStream functions and returns a data frame where each row corresponds to one tweet and each column represents a different field for each tweet (id, text, created_at, etc.).

The total number of tweets that are parsed might be lower than the number of lines in the file or object that contains the tweets because blank lines, deletion notices, and incomplete tweets are ignored.

To parse json to a twitter list, see readTweets. That function can be significantly faster for large files, when only a few fields are required.

Note also that the retweet_count field contains the number of times a given tweet was retweeted at the time it was captured from the API, or for automatic retweets the number of times the original tweet was retweeted.

Examples

Run this code

# NOT RUN {
## The dataset example_tweets contains 10 public statuses published
## by @twitterapi in plain text format. The code below converts the object
## into a data frame that can be manipulated by other functions.

data(example_tweets)
tweets.df <- parseTweets(example_tweets, simplify=TRUE, legacy=TRUE)

# }
# NOT RUN {
## A more complete example, that shows how to capture a user's home timeline
## for one hour using authentication via OAuth, and then parsing the tweets
## into a data frame.

 library(ROAuth)
 reqURL <- "https://api.twitter.com/oauth/request_token"
 accessURL <- "https://api.twitter.com/oauth/access_token"
 authURL <- "https://api.twitter.com/oauth/authorize"
 consumerKey <- "xxxxxyyyyyzzzzzz"
 consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222"
 my_oauth <- OAuthFactory$new(consumerKey=consumerKey,
                              consumerSecret=consumerSecret,
                              requestURL=reqURL,
                              accessURL=accessURL,
                              authURL=authURL)
 my_oauth$handshake()
 userStream( file="my_timeline.json", with="followings",
         timeout=3600, oauth=my_oauth )
 tweets.df <- parseTweets("my_timeline.json")
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples