Learn R Programming

tidysynthesis (version 0.1.2)

synth_spec: Create a synth_spec object

Description

The synth_spec object holds specifications for modeling and sampling components for sequential synthetic data generation. Each component has an associated construct_* function called when creating a presynth object.

Usage

synth_spec(
  default_regression_model = NULL,
  default_classification_model = NULL,
  custom_models = NULL,
  default_regression_steps = NULL,
  default_classification_steps = NULL,
  custom_steps = NULL,
  default_regression_sampler = NULL,
  default_classification_sampler = NULL,
  custom_samplers = NULL,
  default_regression_noise = NULL,
  default_classification_noise = NULL,
  custom_noise = NULL,
  default_regression_tuner = NULL,
  default_classification_tuner = NULL,
  custom_tuners = NULL,
  default_extractor = NULL,
  custom_extractors = NULL,
  invert_transformations = TRUE,
  enforce_na = TRUE
)

Value

A synth_spec object

Arguments

default_regression_model

A model_spec object from library(parsnip) for use in regression models.

default_classification_model

A model_spec object from library(parsnip) for use in classification models.

custom_models

A list of named lists each with two elements: vars for variable names, and model for their associated model. from library(parsnip).

default_regression_steps

A list of recipe::step_ function(s) from library(recipes) for use in regression models.

default_classification_steps

A list of recipe::step_ function(s) from library(recipes) for use in classification models.

custom_steps

A list of named lists each with two elements: vars for variable names, and steps for their associated recipe.

default_regression_sampler

A sampling function for drawing new values from regression models.

default_classification_sampler

A sampling function for drawing new values from classification models.

custom_samplers

A list of named lists each with two elements: vars for variable names, and sampler for their associated sampler

default_regression_noise

A noise function for adding noise to numeric values.

default_classification_noise

A noise function for adding noise to classification values.

custom_noise

A list of named lists each with two elements: vars for variable names, and noise for their associated noise

default_regression_tuner

A tuner from library(tune) for use in regression models.

default_classification_tuner

A tuner from library(tune) for use in classification models.

custom_tuners

A list of named lists each with two elements: vars for variable names, and tuner for their associated tuner

default_extractor

An optional method for extracting workflows or extracts from workflows.

custom_extractors

A list of named lists each with two elements: vars for variable names, and extractor for their associated extractor

invert_transformations

A Boolean for if outcome variable transformations applied through recipes should be inverted during synthesis. recipes need ids that begin with "outcome".

enforce_na

A Boolean for if NA values should be added into the synthetic data with enforce_na() during synthesis. An alternative approach is to add the NA values after synthesis

Examples

Run this code

rpart_mod <- parsnip::decision_tree() |>
  parsnip::set_engine(engine = "rpart") |>
  parsnip::set_mode(mode = "regression")

lm_mod <- parsnip::linear_reg() |> 
  parsnip::set_engine("lm") |>
  parsnip::set_mode(mode = "regression")

step1 <- function(x) {
 x |>
   recipes::step_center(recipes::all_predictors(), id = "center")
}

step2 <- function(x) {
  x |>
    recipes::step_scale(recipes::all_predictors(), id = "scale")
}

step3 <- function(x) { x |> step1() |> step2() }


synth_spec(
 default_regression_model = rpart_mod,
 custom_models = list(
   list("vars" = c("var1", "var2"), 
        "model" = lm_mod)
 ),
 default_regression_steps = step1,
 custom_steps = list(
   list("vars" = c("var2", "var3"),
        "steps" = step2),
   list("vars" = c("var4"), 
        "steps" = step3)
 ),
 default_regression_sampler = sample_rpart,
 custom_samplers = list(
   list("vars" = c("var1", "var2"), 
        "sampler" = sample_lm)
 )
)

Run the code above in your browser using DataLab