Generates a survival data set for synthetic streaming service subscription data. The survival event in this case is a cancellation of the subscription. It is given as a function of household income and average number of hours watched in the prior month. Users can adjust the level of censoring and variance in the data with the supplied parameters or simply call with no parameters for a default distribution of data.
create_synthetic_data(
sample_size = 250,
minimum_income = 5000,
median_income = 50000,
income_variance = 10000,
min_watchhours = 0,
max_watchhours = 6,
censor_percentage = 0,
min_censor_amount = 0,
max_censor_amount = 0,
study_time_in_months = 48,
perturbation_shift = 0
)A survival data set suitable for modeling using spect_train.
optional - size of the sample population to generate
optional - minimum household income used to generate the distribution
optional - median household income used to generate the distribution
optional - variance to use when generating the household income distribution
optional - minimum average number of hours watched used to generate the distribution
optional - minimum average number of hours watched used to generate the distribution
optional - percentage of population to artificially censor
optional - Minimum number of months of censoring to apply to the censored population
optional - maximum number of months of censoring to apply to the censored population
optional - observation horizon in months
optional - defines a boundary for the amount to randomly perturb the formulaic result. Zero for no perturbation
Stephen Abrams, stephen.abrams@louisville.edu
data <- create_synthetic_data()
Run the code above in your browser using DataLab