Learn R Programming

smappR (version 0.5)

extract.retweets: Connect to Mongo database and extract retweets that match conditions specified in the arguments.

Description

extract.tweets opens a connection to the Mongo database in the lab computer and will return all retweets, or only retweets that mention a specific keyword. In combination with summary.retweets, this is a quick way to display the most retweeted tweets over a certain period of time.

Usage

extract.retweets(set, string = NULL, min = 10, from = NULL, to = NULL, verbose = TRUE)

Arguments

set
string, name of the collection of tweets in the Mongo database to query.
string
string or vector of strings, set to NULL by default (will return all retweets). If it is a string, it will return retweets that contain that string. If it is a vector of string, it will return all tweets that contain at least one of them.
min
numeric, set to 10 by default (will return all retweets whose retweet count is at least 10). In large datasets, choose a high number to increase speed of query.
from
date, in string format. If different from NULL, will consider only tweets after that date. Note that using this field requires that the tweets have a field in ISODate format called timestamp. All times are GMT.
to
date, in string format. If different from NULL, will consider only tweets after that date. Note that using this field requires that the tweets have a field in ISODate format called timestamp. All times are GMT.
verbose
logical, default is TRUE, which generates some output to the R console with information about the count of tweets.

Details

Note that this function will only return retweets that are made using the built-in retweeting system - this is, 'manual' retweets using copy&paste are not included. Also note that total retweet counts are based on Twitter's internal tally, and do not reflect the number of retweets in the database. In other words, it could happen that the most popular retweet in a given moment is a tweet that was originally sent days ago, but was retweeted during the time of that tweets were captured.

Examples

Run this code
## Not run: 
# ## connect to the Mongo database
#  mongo <- mongo.create("SMAPP_HOST:PORT", db="DATABASE")
#  mongo.authenticate(mongo, username="USERNAME", password="PASSWORD", db="DATABASE")
#  set <- "DATABASE.COLLECTION"
# 
# ## extract all retweets that were retweeted at least 2000 times
#  rts <- extract.retweets(set, min=2000)
# 
# ## show top 10 retweets from previous query
#  summary(rts, n=10)
# 
# ## extract all retweets that mentioned "turkey" and were retweeted at least 100 times
#  rts <- extract.retweets(set, string="occupygezi", min=100)
# 
# ## show top 10 retweets from previous query
#  summary(rts, n=10)
# ## End(Not run)

Run the code above in your browser using DataLab