Create a summary data frame of transaction counts, amounts, and utilization rates.
trx_stats(
.data,
trx_types,
percent_of = NULL,
combine_trx = FALSE,
col_exposure = "exposure",
full_exposures_only = TRUE,
conf_int = FALSE,
conf_level = 0.95
)# S3 method for trx_df
summary(object, ...)
A tibble with class trx_df, tbl_df, tbl,
and data.frame. The results include columns for any grouping
variables and transaction types, plus the following:
trx_n: the number of unique transactions.
trx_amt: total transaction amount
trx_flag: the number of observation periods with non-zero transaction amounts.
exposure: total exposures
avg_trx: mean transaction amount (trx_amt / trx_flag)
avg_all: mean transaction amount over all records (trx_amt / exposure)
trx_freq: transaction frequency when a transaction occurs (trx_n / trx_flag)
trx_util: transaction utilization per observation period (trx_flag / exposure)
If percent_of is provided, the results will also include:
The sum of any columns passed to percent_of with non-zero transactions.
These columns include the suffix _w_trx.
The sum of any columns passed to percent_of
pct_of_{*}_w_trx: total transactions as a percentage of column
{*}_w_trx. In other words, total transactions divided by the sum of a
column including only records utilizing transactions.
pct_of_{*}_all: total transactions as a percentage of column {*}. In
other words, total transactions divided by the sum of a column regardless
of whether or not transactions were utilized.
If conf_int is set to TRUE, additional columns are added for lower and
upper confidence interval limits around the observed utilization rate and any
percent_of output columns. Confidence interval columns include the name
of the original output column suffixed by either _lower or _upper.
If values are passed to percent_of, an additional column is created
containing the the sum of squared transaction amounts (trx_amt_sq).
A data frame with exposure-level records of type
exposed_df with transaction data attached. If necessary, use
as_exposed_df() to convert a data frame to an exposed_df object, and use
add_transactions() to attach transactions to an exposed_df object.
A character vector of transaction types to include in the
output. If none is provided, all available transaction types in .data
will be used.
A optional character vector containing column names in
.data to use as denominators in the calculation of utilization rates or
actual-to-expected ratios.
If FALSE (default), the results will contain output rows
for each transaction type. If TRUE, the results will contains aggregated
experience across all transaction types.
Name of the column in .data containing exposures
If TRUE (default), partially exposed records will
be excluded from data.
If TRUE, the output will include confidence intervals
around the observed utilization rate and any percent_of output columns.
Confidence level for confidence intervals
A trx_df object
Groups to retain after summary() is called
The percent_of argument is optional. If provided, this argument must
be a character vector with values corresponding to columns in .data
containing values to use as denominators in the calculation of utilization
rates or actual-to-expected ratios. Example usage:
In a study of partial withdrawal transactions, if percent_of refers to
account values, observed withdrawal rates can be determined.
In a study of recurring claims, if percent_of refers to a column
containing a maximum benefit amount, utilization rates can be determined.
If conf_int is set to TRUE, the output will contain lower and upper
confidence interval limits for the observed utilization rate and any
percent_of output columns. The confidence level is dictated
by conf_level.
Intervals for the utilization rate (trx_util) assume a binomial
distribution.
Intervals for transactions as a percentage of another column with
non-zero transactions (pct_of_{*}_w_trx) are constructed using a normal
distribution
Intervals for transactions as a percentage of another column
regardless of transaction utilization (pct_of_{*}_all) are calculated
assuming that the aggregate distribution is normal with a mean equal to
observed transactions and a variance equal to:
Var(S) = E(N) * Var(X) + E(X)^2 * Var(N),
Where S is the aggregate transactions random variable, X is an individual
transaction amount assumed to follow a normal distribution, and N is a
binomial random variable for transaction utilization.
As a default, partial exposures are removed from .data before summarizing
results. This is done to avoid complexity associated with a lopsided skew
in the timing of transactions. For example, if transactions can occur on a
monthly basis or annually at the beginning of each policy year, partial
exposures may not be appropriate. If a policy had an exposure of 0.5 years
and was taking withdrawals annually at the beginning of the year, an
argument could be made that the exposure should instead be 1 complete year.
If the same policy was expected to take withdrawals 9 months into the year,
it's not clear if the exposure should be 0.5 years or 0.5 / 0.75 years.
To override this treatment, set full_exposures_only to FALSE.
Applying summary() to a trx_df object will re-summarize the
data while retaining any grouping variables passed to the "dots"
(...).
Unlike exp_stats(), this function requires data to be an
exposed_df object.
If .data is grouped, the resulting data frame will contain
one row per transaction type per group.
Any number of transaction types can be passed to the trx_types argument,
however each transaction type must appear in the trx_types attribute of
.data. In addition, trx_stats() expects to see columns named trx_n_{*}
(for transaction counts) and trx_amt_{*} for (transaction amounts) for each
transaction type. To ensure .data is in the appropriate format, use the
functions as_exposed_df() to convert an existing data frame with
transactions or add_transactions() to attach transactions to an existing
exposed_df object.
expo <- expose_py(census_dat, "2019-12-31", target_status = "Surrender") |>
add_transactions(withdrawals)
res <- expo |> group_by(inc_guar) |> trx_stats(percent_of = "premium")
res
summary(res)
expo |> group_by(inc_guar) |>
trx_stats(percent_of = "premium", combine_trx = TRUE, conf_int = TRUE)
Run the code above in your browser using DataLab