Learn R Programming

CDMConnector (version 2.4.0)

cdmFromCohortSet: Build a Synthetic CDM from a Cohort Set

Description

Constructs a synthetic OMOP Common Data Model (CDM) using a set of cohort definitions, created using CDMConnector::readCohortSet(). The function generates synthetic data and returns a cdm reference object backed by a DuckDB database, containing synthetic CDM tables and generated cohort table rows.

Usage

cdmFromCohortSet(
  cohortSet,
  n = 100,
  cohortTable = "cohort",
  duckdbPath = NULL,
  seed = 1,
  verbose = FALSE,
  ...
)

Value

A cdm reference object (as returned by CDMConnector::cdmFromCon()) backed by a DuckDB database. The returned object contains synthetic CDM tables and cohort table rows generated from the specified cohort definitions. The returned cdm has an attribute synthetic_summary (a list with cohort_summaries, cohort_index, n_cohorts, summary

(one-line text), any_low_match) for diagnostics and match rates.

Arguments

cohortSet

A data frame (usually from CDMConnector::readCohortSet()) with columns cohort_definition_id, cohort_name, and cohort (cohort definition as a list or JSON string).

n

Integer. Total number of synthetic persons to generate across all cohorts. Defaults to 100.

cohortTable

Character. Name of the cohort table (default "cohort").

duckdbPath

Character or NULL. Path for the final merged DuckDB; if NULL a temporary file is used.

seed

Integer. Base RNG seed; each cohort uses seed + cohort_index for reproducibility (default 1).

verbose

If TRUE, print progress per cohort and per attempt (default FALSE).

...

Arguments passed through to cdmFromJson for each cohort (e.g. targetMatch, successRate, visitConceptId, eventDateJitter, visitDateJitter, demographicVariety, sourceAndTypeVariety, valueVariety). seed is overridden per cohort. targetMatch is per cohort: that fraction of each cohort's generated persons are intended to qualify for that cohort only.

Reproducibility

With the same seed, cohortSet, and other arguments, cdmFromCohortSet produces the same synthetic data. Changing seed or n changes the data. The data are random but reproducible.

Examples

Run this code
if (FALSE) {
library(CDMConnector)
cohortSet <- readCohortSet(system.file("cohorts", package = "CDMConnector"))
cdm <- cdmFromCohortSet(cohortSet, n = 100)
cdm$person
}

Run the code above in your browser using DataLab