- data
A data frame containing transaction records.
- id_col
A character string specifying the column name that identifies each application (e.g., "application_id").
- description_col
A character string specifying the column name that contains the transaction descriptions.
Note that this column may contain NA values.
- amount_col
Optional. A character string specifying the column name that contains transaction amounts.
If provided, the function aggregates a value for each keyword (default ops = list(amount = sum)).
If omitted (NULL), the function aggregates counts of keyword occurrence (default ops = list(count = sum)).
- time_col
Optional. A character string specifying the column name that contains the transaction date
(or timestamp). When period is a numeric vector, this is required to filter the data by observation window.
- observation_window_start_col
Optional. A character string indicating the column name with the observation window start date.
If period is not "all" and is not numeric, this column is used in aggregate_applications.
- scrape_date_col
Optional. A character string indicating the column name with the scrape date.
If period is not "all" and is not numeric, this column is used in aggregate_applications.
- ops
A named list of functions used to compute summary features on the aggregated values.
If amount_col is provided and ops is NULL, the default is list(amount = sum).
If amount_col is NULL and ops is NULL, the default is list(count = sum).
- period
Either a character string or a numeric vector controlling time aggregation.
The default is "all", meaning no time segmentation. If a numeric vector is provided (e.g., c(30, 3)),
it defines a cycle length in days (first element) and a number of consecutive cycles (second element). In that case,
only transactions with a transaction date between scrape_date - (period[1] * period[2]) and scrape_date
are considered.
- separate_direction
Logical. If TRUE (the default when amount_col is provided),
a new column "direction" is added to automatically separate incoming and outgoing transactions based on the sign
of the amount.
- group_cols
Optional. A character vector of additional grouping columns to use during aggregation.
If separate_direction is TRUE, the "direction" grouping is added automatically.
- min_freq
Numeric. The minimum frequency a token must have to be included in the keyword extraction.
Default is 1.
- use_matrix
Logical. Passed to extract_keyword_features; if TRUE (the default) a sparse matrix is used.
- convert_to_df
Logical. Passed to extract_keyword_features; if TRUE (the default) the sparse matrix
is converted to a data.frame, facilitating binding with other data.
- period_agg
A function used to aggregate values within each period (see aggregate_applications).
Default is sum.
- period_missing_inputs
A numeric value to replace missing aggregated values. Default is 0.