Efficient implementation for the conversion of an event log into a
customer-by-sufficient-statistic (CBS) data.frame, with a row for each
customer, which is the required data format for estimating model parameters.
elog2cbs(elog, units = "week", T.cal = NULL, T.tot = NULL)Event log, a data.frame with field cust for the
customer ID and field date for the date/time of the event, which
should be of type Date or POSIXt. If a field sales is
present, it will be aggregated as well.
Time unit, either week, day, hour,
min or sec. See difftime.
End date of calibration period. Defaults to
max(elog$date).
End date of the observation period. Defaults to
max(elog$date).
data.frame with fields:
custCustomer id (unique key).
xNumber of recurring events in calibration period.
t.xTime between first and last event in calibration period.
littSum of logarithmic intertransaction timings during calibration period.
salesSum of sales in calibration period, incl. initial transaction. Only if elog$sales is provided.
sales.xSum of sales in calibration period, excl. initial transaction. Only if elog$sales is provided.
firstDate of first transaction in calibration period.
T.calTime between first event and end of calibration period.
T.starLength of holdout period. Only if T.cal is provided.
x.starNumber of events within holdout period. Only if T.cal is provided.
sales.starSum of sales within holdout period. Only if T.cal and elog$sales are provided.
The time unit for expressing t.x, T.cal and litt are
determined via the argument units, which is passed forward to method
difftime, and defaults to weeks.
Argument T.tot allows one to specify the end of the observation period,
i.e. the last possible date of an event to still be included in the event
log. If T.tot is not provided, then the date of the last recorded event
will be assumed to coincide with the end of the observation period. If
T.tot is provided, then any event that occurs after that date is discarded.
Argument T.cal allows one to split the summary statistics into a
calibration and a holdout period. This can be useful for evaluating
forecasting accuracy for a given dataset. If T.cal is not provided,
then the whole observation period is considered, and is then subsequently
used for for estimating model parameters. If it is provided, then the
returned data.frame contains two additional fields, with x.star
representing the number of repeat transactions during the holdout period of
length T.star. And only those customers are contained, who have had at
least one event during the calibration period.
Transactions with identical cust and date field are treated as
a single transaction, with sales being summed up.
# NOT RUN {
data("groceryElog")
cbs <- elog2cbs(groceryElog, T.cal = "2006-12-31", T.tot = "2007-12-30")
head(cbs)
# }
Run the code above in your browser using DataLab