compare_sdg: Compare the performance of generators.

Description

compare_sdg compares the preditive performance of models trained by synthetic data with model trained by real data.

Usage

compare_sdg(
  learner,
  measurement,
  target_var,
  real_dataset,
  generated_data1,
  generated_data2 = NA,
  generated_data3 = NA,
  generated_data4 = NA,
  generated_data5 = NA,
  generated_data6 = NA
)

Arguments

learner

A learner object from makeLearners.

measurement

A list of performance measurements for benchmark.

target_var

A string of the response variable name.

real_dataset

A list of data frames with a training_set data frame and a testing_set data frame. You can get this list from split_data.

generated_data1

A data frame of synthetic data 1.

generated_data2

A data frame of synthetic data 2.

generated_data3

A data frame of synthetic data 3.

generated_data4

A data frame of synthetic data 4.

generated_data5

A data frame of synthetic data 5.

generated_data6

A data frame of synthetic data 6.

Value

The output is a benchmark object. It compares the the preditive performance of selected models trained by the real data and validated by the testing data with models trained by the generated data and validated by the testing data.

Details

This function returns the measured performance of predictive models trained by the synthetic data. We assume good quality synthetic data would allow us to draw the same analytic conclusions as we can draw from real data. Hence, we compare the predictive performance of several machine learning algorithms that are trained with the synthetic data and tested by real data with those trained and tested both by real data.

Examples

Run this code

# NOT RUN {
library(mlr)
adult_data <- adult[c('age', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week',
                      'income')]
adult_data <- split_data(adult_data[1:100,], 70)
bn_learn <- gen_bn_learn(adult_data$training_set, "hc")
lrns <- makeLearners(c("rpart", "logreg"), type = "classif",predict.type = "prob")
measurements <- list(acc, ber)
bmr <- compare_sdg(lrns,
    measurement = measurements,
    target_var = "income",
    real_dataset = adult_data,
    generated_data1 = bn_learn$gen_data)
names(bmr$results) <- c("real_dataset","bn_learn")
bmr

# }

Run the code above in your browser using DataLab