ts_filter: ts_filter

Description

Converts text-level observations to time aggregated frequency data frame with [optional] filtered dummy variable(s).

Usage

ts_filter(rt, by = "days", dtname = "created_at", txt = "text",
  filter = NULL, key = NULL, na.omit = TRUE, trim = FALSE)

Arguments

Tweets or users data frame. Technically, this argument will accept any recursive object (i.e., list or data frame) containing a named date-time (POSIXt) element or column. By default, ts_plot assumes the date-time variable is labeled "created_at", which is the default date-time label used in tweets data. However, this function should work with any data source, assuming it meets the (a) POSIXt class requirement and (b) the date-time variable is given the appropriate name (if not "created_at" then a label specified with the dtname argument).

Unit of time, e.g., secs, days, weeks, months, years by which to aggregate observations. By default, ts_plot tries to aggregate time by "days", but for some high-frequency data sets that only span a matter of minutes or hours, this is likely to either produce an error or a truly disappointing plot. In these cases, users are encouraged to explore smaller units of time. Conversely, high-frequency and long [in duration] data sets may be difficult to read given the default unit of time. In these cases, users should try larger units of time, e.g., "weeks" or "months". This parameter will also accept numeric quantifiers in addition to units of time. By default, for example, the provided unit of time is assumed to specify whole (1) units of time. It is posible to tweak this unit by specifying the number (or fraction) of time units, e.g., by = "2 weeks", by = "30 secs", by = ".333 days".

dtname

Name of date-time (POSIXt) column (if data frame) or element (if list). Defaults to "created_at", the default label supplied as a timestamp variable for tweets data. This function is exportable to non-Twitter data, assuming the intended data object includes a date-time variable with the same label that's supplied to the dtname parameter.

txt

Name of distinguishing variable in data frame or list to which filter is applied. Defaults to text.

filter

Vector of regular expressions with which to filter data (creating multiple time series).

key

Optional provide pretty labels for filters. Defaults to actual filters.

na.omit

Logical indicating whether to omit rows with missing (NA) values for the dtname variable. Defaults to TRUE. If FALSE and data contains missing values for the date-time variable, an error will be returned to the user.

trim

Logical indicating whether to trim extreme intervals, which often capture artificially lower frequencies. Defaults to FALSE.