robyn_inputs()
is the function to input all model parameters and
check input correctness for the initial model build. It includes the
engineering process results that conducts trend, season,
holiday & weekday decomposition using Facebook's time-series forecasting
library prophet
and fit a nonlinear model to spend and exposure
metrics in case exposure metrics are used in paid_media_vars
.
robyn_inputs(
dt_input = NULL,
dep_var = NULL,
dep_var_type = NULL,
date_var = "auto",
paid_media_spends = NULL,
paid_media_vars = NULL,
paid_media_signs = NULL,
organic_vars = NULL,
organic_signs = NULL,
context_vars = NULL,
context_signs = NULL,
factor_vars = NULL,
dt_holidays = Robyn::dt_prophet_holidays,
prophet_vars = NULL,
prophet_signs = NULL,
prophet_country = NULL,
adstock = NULL,
hyperparameters = NULL,
window_start = NULL,
window_end = NULL,
calibration_input = NULL,
json_file = NULL,
InputCollect = NULL,
...
)# S3 method for robyn_inputs
print(x, ...)
List. Contains all input parameters and modified results
using Robyn:::robyn_engineering()
. This list is ready to be
used on other functions like robyn_run()
and print()
.
Class: robyn_inputs
.
data.frame. Raw input data. Load simulated
dataset using data("dt_simulated_weekly")
Character. Name of dependent variable. Only one allowed
Character. Type of dependent variable as "revenue" or "conversion". Will be used to calculate ROI or CPI, respectively. Only one allowed and case sensitive.
Character. Name of date variable. Daily, weekly
and monthly data supported.
date_var
must have format "2020-01-01" (YYY-MM-DD).
Default to automatic date detection.
Character vector. Names of the paid media variables.
The values on each of these variables must be numeric. Also,
paid_media_spends
must have same order and length as
paid_media_vars
respectively.
Character vector. Names of the paid media variables'
exposure level metrics (impressions, clicks, GRP etc) other than spend.
The values on each of these variables must be numeric. These variables are not
being used to train the model but to check relationship and recommend to
split media channels into sub-channels (e.g. fb_retargeting, fb_prospecting,
etc.) to gain more variance. paid_media_vars
must have same
order and length as paid_media_spends
respectively and is not required.
Character vector. Choose any of
c("default", "positive", "negative")
. Control
the signs of coefficients for paid_media_vars
. Must have same
order and same length as paid_media_vars
. By default, all values are
set to 'positive'.
Character vector. Typically newsletter sendings,
push-notifications, social media posts etc. Compared to paid_media_vars
organic_vars
are often marketing activities without clear spends.
Character vector. Choose any of
"default", "positive", "negative". Control
the signs of coefficients for organic_vars
Must have same
order and same length as organic_vars
. By default, all values are
set to "positive".
Character vector. Typically competitors, price & promotion, temperature, unemployment rate, etc.
Character vector. Choose any of
c("default", "positive", "negative")
. Control
the signs of coefficients for context_vars. Must have same
order and same length as context_vars
. By default it's
set to 'defualt'.
Character vector. Specify which of the provided variables in organic_vars or context_vars should be forced as a factor.
data.frame. Raw input holiday data. Load standard
Prophet holidays using data("dt_prophet_holidays")
Character vector. Include any of "trend", "season", "weekday", "monthly", "holiday" or NULL. Highly recommended to use all for daily data and "trend", "season", "holiday" for weekly and above cadence. Set to NULL to skip prophet's functionality.
Character vector. Choose any of
"default", "positive", "negative". Control
the signs of coefficients for prophet_vars
. Must have same
order and same length as prophet_vars
. By default, all values are
set to "default".
Character. Only one country allowed.
Includes national holidays for all countries, whose list can
be found loading data("dt_prophet_holidays")
.
Character. Choose any of "geometric", "weibull_cdf",
"weibull_pdf". Weibull adstock is a two-parametric function and thus more
flexible, but takes longer time than the traditional geometric one-parametric
function. CDF, or cumulative density function of the Weibull function allows
changing decay rate over time in both C and S shape, while the peak value will
always stay at the first period, meaning no lagged effect. PDF, or the
probability density function, enables peak value occurring after the first
period when shape >=1, allowing lagged effect. Run plot_adstock()
to
see the difference visually. Time estimation: with geometric adstock, 2000
iterations * 5 trials on 8 cores, it takes less than 30 minutes. Both Weibull
options take up to twice as much time.
List. Contains hyperparameter lower and upper bounds.
Names of elements in list must be identical to output of hyper_names()
.
To fix hyperparameter values, provide only one value.
Character. Set start and end dates of modelling
period. Recommended to not start in the first date in dataset to gain adstock
effect from previous periods. Also, columns to rows ratio in the input data
to be >=10:1, or in other words at least 10 observations to 1 independent variable.
This window will determine the date range of the data period within your dataset
you will be using to specifically regress the effects of media, organic and
context variables on your dependent variable. We recommend using a full
dt_input
dataset with a minimum of 1 year of history, as it will be used
in full for the model calculation of trend, seasonality and holidays effects.
Whereas the window period will determine how much of the full data set will be
used for media, organic and context variables.
data.frame. Optional. Provide experimental results to calibrate. Your input should include the following values for each experiment: channel, liftStartDate, liftEndDate, liftAbs, spend, confidence, metric. You can calibrate any spend or organic variable with a well designed experiment. You can also use experimental results from multiple channels; to do so, provide concatenated channel value, i.e. "channel_A+channel_B". Check "Guide for calibration source" section.
Character. JSON file to import previously exported inputs or
recreate a model. To generate this file, use robyn_write()
.
If you didn't export your data in the json file as "raw_data",
dt_input
must be provided; dt_holidays
input is optional.
Default to NULL. robyn_inputs
's output when
hyperparameters
are not yet set.
Additional parameters passed to prophet
functions.
robyn_inputs()
output.
We strongly recommend to use experimental and causal results that are considered ground truth to calibrate MMM. Usual experiment types are people-based (e.g. Facebook conversion lift) and geo-based (e.g. Facebook GeoLift).
Currently, Robyn only accepts point-estimate as calibration input. For example, if 10k$ spend is tested against a hold-out for channel A, then input the incremental return as point-estimate as the example below.
The point-estimate has to always match the spend in the variable. For example, if channel A usually has 100k$ weekly spend and the experimental HO is 70
# Using dummy simulated data
InputCollect <- robyn_inputs(
dt_input = Robyn::dt_simulated_weekly,
dt_holidays = Robyn::dt_prophet_holidays,
date_var = "DATE",
dep_var = "revenue",
dep_var_type = "revenue",
prophet_vars = c("trend", "season", "holiday"),
prophet_country = "DE",
context_vars = c("competitor_sales_B", "events"),
paid_media_spends = c("tv_S", "ooh_S", "print_S", "facebook_S", "search_S"),
paid_media_vars = c("tv_S", "ooh_S", "print_S", "facebook_I", "search_clicks_P"),
organic_vars = "newsletter",
factor_vars = "events",
window_start = "2016-11-23",
window_end = "2018-08-22",
adstock = "geometric",
# To be defined separately
hyperparameters = NULL,
calibration_input = NULL
)
print(InputCollect)
Run the code above in your browser using DataLab