Converts text-level observations to time aggregated frequency data frame with [optional] filtered dummy variable(s).
ts_filter(rt, by = "days", dtname = "created_at", txt = "text",
filter = NULL, key = NULL, na.omit = TRUE, trim = FALSE)
Tweets or users data frame. Technically, this argument
will accept any recursive object (i.e., list or data frame)
containing a named date-time (POSIXt) element or column. By
default, ts_plot
assumes the date-time variable is
labeled "created_at", which is the default date-time label used
in tweets data. However, this function should work with any
data source, assuming it meets the (a) POSIXt class requirement
and (b) the date-time variable is given the appropriate name
(if not "created_at" then a label specified with the
dtname
argument).
Unit of time, e.g., secs, days, weeks,
months, years
by which to aggregate observations. By default,
ts_plot
tries to aggregate time by "days", but for some
high-frequency data sets that only span a matter of minutes or
hours, this is likely to either produce an error or a truly
disappointing plot. In these cases, users are encouraged to
explore smaller units of time. Conversely, high-frequency and
long [in duration] data sets may be difficult to read given the
default unit of time. In these cases, users should try larger
units of time, e.g., "weeks" or "months". This parameter will
also accept numeric quantifiers in addition to units of time.
By default, for example, the provided unit of time is
assumed to specify whole (1) units of time. It is posible to
tweak this unit by specifying the number (or fraction) of time
units, e.g., by = "2 weeks"
, by = "30 secs"
,
by = ".333 days"
.
Name of date-time (POSIXt) column (if data frame)
or element (if list). Defaults to "created_at", the default
label supplied as a timestamp variable for tweets data. This
function is exportable to non-Twitter data, assuming the
intended data object includes a date-time variable with the
same label that's supplied to the dtname
parameter.
Name of distinguishing variable in data frame or list to which filter is applied. Defaults to text.
Vector of regular expressions with which to filter data (creating multiple time series).
Optional provide pretty labels for filters. Defaults to actual filters.
Logical indicating whether to omit rows with missing (NA) values for the dtname variable. Defaults to TRUE. If FALSE and data contains missing values for the date-time variable, an error will be returned to the user.
Logical indicating whether to trim extreme intervals, which often capture artificially lower frequencies. Defaults to FALSE.