- data
A data frame containing transaction records.
- id_col
A character string specifying the column name that identifies each application (e.g., "application_id"
).
- description_col
A character string specifying the column name that contains the transaction descriptions.
Note that this column may contain NA
values.
- amount_col
Optional. A character string specifying the column name that contains transaction amounts.
If provided, the function aggregates a value for each keyword (default ops = list(amount = sum)
).
If omitted (NULL
), the function aggregates counts of keyword occurrence (default ops = list(count = sum)
).
- time_col
Optional. A character string specifying the column name that contains the transaction date
(or timestamp). When period
is a numeric vector, this is required to filter the data by observation window.
- observation_window_start_col
Optional. A character string indicating the column name with the observation window start date.
If period
is not "all"
and is not numeric, this column is used in aggregate_applications
.
- scrape_date_col
Optional. A character string indicating the column name with the scrape date.
If period
is not "all"
and is not numeric, this column is used in aggregate_applications
.
- ops
A named list of functions used to compute summary features on the aggregated values.
If amount_col
is provided and ops
is NULL
, the default is list(amount = sum)
.
If amount_col
is NULL
and ops
is NULL
, the default is list(count = sum)
.
- period
Either a character string or a numeric vector controlling time aggregation.
The default is "all"
, meaning no time segmentation. If a numeric vector is provided (e.g., c(30, 3)
),
it defines a cycle length in days (first element) and a number of consecutive cycles (second element). In that case,
only transactions with a transaction date between scrape_date - (period[1] * period[2])
and scrape_date
are considered.
- separate_direction
Logical. If TRUE
(the default when amount_col
is provided),
a new column "direction"
is added to automatically separate incoming and outgoing transactions based on the sign
of the amount.
- group_cols
Optional. A character vector of additional grouping columns to use during aggregation.
If separate_direction
is TRUE
, the "direction"
grouping is added automatically.
- min_freq
Numeric. The minimum frequency a token must have to be included in the keyword extraction.
Default is 1.
- use_matrix
Logical. Passed to extract_keyword_features
; if TRUE
(the default) a sparse matrix is used.
- convert_to_df
Logical. Passed to extract_keyword_features
; if TRUE
(the default) the sparse matrix
is converted to a data.frame, facilitating binding with other data.
- period_agg
A function used to aggregate values within each period (see aggregate_applications
).
Default is sum
.
- period_missing_inputs
A numeric value to replace missing aggregated values. Default is 0
.