ts_plot: ts_plot

Description

Plots frequency of tweets as time series or, if multiple filters (text-based criteria used to subset data) are specified, multiple time series.

Usage

ts_plot(rt, by = "days", dtname = "created_at", txt = "text",
  na.omit = TRUE, filter = NULL, key = NULL, trim = FALSE, lwd = 1.5,
  linetype = FALSE, cols = NULL, theme = "light", main = NULL,
  subtitle = NULL, adj = TRUE, xlab = "Time", ylab = "Freq",
  box = FALSE, axes = TRUE, legend.title = NULL, ticks = 0, cex = 1,
  cex.main, cex.sub, cex.lab, cex.axis, cex.legend, mar, font.main = 1,
  xtime = NULL, plot = TRUE, ...)

Arguments

Tweets or users data frame. Technically, this argument will accept any recursive object (i.e., list or data frame) containing a named date-time (POSIXt) element or column. By default, ts_plot assumes the date-time variable is labeled "created_at", which is the default date-time label used in tweets data. However, this function should work with any data source, assuming it meets the (a) POSIXt class requirement and (b) the date-time variable is given the appropriate name (if not "created_at" then a label specified with the dtname argument).

Unit of time, e.g., secs, days, weeks, months, years by which to aggregate observations. By default, ts_plot tries to aggregate time by "days", but for some high-frequency data sets that only span a matter of minutes or hours, this is likely to either produce an error or a truly disappointing plot. In these cases, users are encouraged to explore smaller units of time. Conversely, high-frequency and long [in duration] data sets may be difficult to read given the default unit of time. In these cases, users should try larger units of time, e.g., "weeks" or "months". This parameter will also accept numeric quantifiers in addition to units of time. By default, for example, the provided unit of time is assumed to specify whole (1) units of time. It is posible to tweak this unit by specifying the number (or fraction) of time units, e.g., by = "2 weeks", by = "30 secs", by = ".333 days".

dtname

Name of date-time (POSIXt) column (if data frame) or element (if list). Defaults to "created_at", the default label supplied as a timestamp variable for tweets data. This function is exportable to non-Twitter data, assuming the intended data object includes a date-time variable with the same label that's supplied to the dtname parameter.

txt

Name of distinguishing variable in data frame or list to which filter is applied. Defaults to text.

na.omit

Logical indicating whether to omit rows with missing (NA) values for the dtname variable. Defaults to TRUE. If FALSE and data contains missing values for the date-time variable, an error will be returned to the user.

filter

Vector of regular expressions with which to filter data (creating multiple time series).

key

Optional provide pretty labels for filters. Defaults to actual filters.

trim

Logical indicating whether to trim extreme intervals, which often capture artificially lower frequencies. Defaults to FALSE.

lwd

Width of time series line(s). Defaults to 1.5

linetype

Logical indicating whether lines should be distinguished by line type.

cols

Colors for filters. Leave NULl for default color scheme.

theme

Either integer (0-8) or character string specifyng the plot theme. Options include "light", "inverse", "dark", "nerd", "gray", "spacegray", "minimal", and "apa" (my attempt at making an APA-consistent graphic).

main

Optional, title of the plot. By default, the title is printed on top of the plot and it is left-justified (ggplot2 style). To alter justification, see adj.

subtitle

Optional, text for plot subtitle. Inherits justification method from main.

adj

Logical indicating whether to left justify main plot title. Defaults to TRUE. To more exactly specify hornizontal location of the title, provide a numeric value between 0 (left) and 1 (right).

xlab

Optional, text for x-axis title, defaults to "Time".

ylab

Optional, text for y-axis title, defaults to "Freq"

box

Logical indicating whether to draw box around plot area. Defaults to false.

axes

Logical indicating whether to draw axes. Defaults to true. Users may set this to FALSE and supply their own axes using the base graphics axis function.

legend.title

Provide title for legend or ignore to leave blank (default).

ticks

Numeric specifying width of tick marks. Defaults to zero. If you'd like tick marks, try setting this value to 1.25.

cex

Global cex setting defaults to 1.0.

cex.main

Size of plot title (if plot title provided via main = "title" argument).

cex.sub

Size of subtitles

cex.lab

Size of axis labels

cex.axis

Size of axis text

cex.legend

Size of legend text

mar

Margins in number of lines.

font.main

Font style of main title if provided. Default is to 1, which means (non-bold) normal font, overriding R's bold default, which I think is a little to aggressive. If you disagree with me, you can make the title bold by setting this value to 2.

xtime

Format date-time labels in x-axis. Accepts any format string via strptime, e.g., xtime = "%F %H:%S".

plot

Deprecated. Use ts_filter to create time series-like data frame.

…

Arguments passed to base graphics plot function.

Examples

Run this code

# NOT RUN {
## stream tweets mentioning trump for 30 mins
rt <- stream_tweets(
    q = "realdonaldtrump",
    timeout = (60 * 60 * 30))

## plot tweet data aggregated by minute (default)
ts_plot(rt, by = "mins")

## use a different time increment, line width, and theme
ts_plot(rt, by = "30 secs", lwd = .75, theme = "inverse")

## filter data using regular expressions and
## plot each corresponding time series
ts_plot(rt, by = "mins",
        theme = "gray",
        main = "Partisanship in tweets about Trump",
        filter = c("democrat|liberal|libs",
                   "republican|conservativ|gop"),
        key = c("Democrats", "Republicans"))

## ts_plot also accepts data frames created via ts_filter
rt.ts <- ts_filter(
    rt, "mins",
    filter = c("democrat|liberal|libs",
               "republican|conservativ|gop"),
    key = c("Democrats", "Republicans"))
## printing should yield around 30 rows (give or take)
## since stream was 30 mins and aggregated by minute
rt.ts

## Pass data frame created by ts_filter to ts_plot
ts_plot(rt.ts, theme = "spacegray")

## the returned data frame from ts_filter also fits the
## tidyverse and includes three columns
## Column 1 - time Date-time obj of [median] time intervals
## Column 2 - freq Integer (class double) frequency counts
## Column 3 - filter Keys of different time series filters

## This makes it easy to pass the data along to ggplot
## but my themes are cooler anyway so why bother?
## library(ggplot2)
## rt.ts `%>%`
##     ggplot(aes(x = time, y = freq, color = filter)) +
##     geom_line()
# }

Run the code above in your browser using DataLab