Learn R Programming

smappR (version 0.5)

extract.hashtags: Connect to Mongo database and extract hashtags from each tweet.

Description

extract.hashtags opens a connection to the Mongo database in the lab computer and returns a list of hashtags used in all tweets, or in tweets that contain a given keyword. In combination with summary.retweets, this is a quick way to display the top hashtags used in a collection of tweets.

Usage

extract.hashtags(set, text = NULL, string = NULL, from = NULL, to = NULL, verbose = TRUE)

Arguments

set
string, name of the collection of tweets in the Mongo database to query.
text
vector with tweets text. To be used when all other arguments are NULL. The function then will extract and count the hashtags in in text.
string
string or vector of strings, set to NULL by default (will return hashtags for all tweets). If it is a string, it will return all hashtags that were used in tweets containing that string. If it is a vector of strings, it will return all hashtags that were used in tweets containing at least one of the strings.
from
date, in string format. If different from NULL, will consider only tweets after that date. Note that using this field requires that the tweets have a field in ISODate format called timestamp. All times are GMT.
to
date, in string format. If different from NULL, will consider only tweets after that date. Note that using this field requires that the tweets have a field in ISODate format called timestamp. All times are GMT.
verbose
logical, default is TRUE, which generates some output to the R console with information about the count of tweets.

Examples

Run this code
## Not run: 
# ## connect to the Mongo database
#  mongo <- mongo.create("SMAPP_HOST:PORT", db="DATABASE")
#  mongo.authenticate(mongo, username="USERNAME", password="PASSWORD", db="DATABASE")
#  set <- "DATABASE.COLLECTION"
# 
# ## extract all hashtags in a collection of tweets
#  ht <- extract.hashtags(set)
# 
# ## show top 10 hashtags
# summary(ht, n=10)
# 
# ## extract all hashtags that are used in tweets that mention "occupygezi"
#  ht <- extract.hashtags(set, string="occupygezi")
# 
# ## show top 10 hashtags in tweets mentioning "occupygezi"
#  summary(ht, n=10)
# ## End(Not run)

Run the code above in your browser using DataLab