Learn R Programming

tidysynthesis (version 0.1.2)

schema: Generate a schema object.

Description

Generate a schema object.

Usage

schema(
  conf_data,
  start_data,
  col_schema = NULL,
  enforce = TRUE,
  coerce_to_factors = FALSE,
  coerce_to_doubles = FALSE,
  na_factor_to_level = TRUE,
  na_numeric_to_ind = TRUE
)

Value

A schema object.

Arguments

conf_data

A data frame to be synthesized.

start_data

A data frame with starting variables.

col_schema

An optional named list of columns in the confidential data with their properties, including data type and factor levels. If NULL or only partially specified, col_schema will be inferred from the confidential data. See example code for formatting.

enforce

Boolean that if true, will preprocess both conf_data and start_data to align with col_schema and the arguments below.

coerce_to_factors

Boolean that if true, coerces categorical data types (chr, fct, ord) to base R factors when enforce_schema is called.

coerce_to_doubles

Boolean that if true, coerces columns specified as dbl in col_schema to base R doubles when enforce_schema is called.

na_factor_to_level

Boolean that if true, applies convert_level_to_na() to factor variables when enforce_schema is called.

na_numeric_to_ind

Boolean that if true, applies expand_na() to numeric data to create logical missingness indicators when enforce_schema is called.

Examples

Run this code

conf_data <- data.frame(
  var1 = c("1", "1", "2"),
  var2 = c(1L, 2L, 3L),
  var3 = c(1.1, 2.2, 3.3)
)

start_data <- dplyr::select(conf_data, var1)

# default inferred schema
schema(
  conf_data = conf_data,
  start_data = start_data
)

# overwriting factor levels
schema(
  conf_data = conf_data,
  start_data = start_data,
  col_schema = list(
    "var1" = list(
      "dtype" = "fct",
      "levels" = c("1", "2", "3")
    )
  ),
  coerce_to_factors = TRUE
)


Run the code above in your browser using DataLab