diagnostic: diagnostic - Investigates the prediction problem settings - use before training a model

Description

This function runs a set of prediction diagnoses to help pick a suitable T, O, TAR and determine whether the prediction problem is worth executing.

Usage

diagnostic(
  plpData = NULL,
  cdmDatabaseName,
  connectionDetails,
  cdmDatabaseSchema,
  oracleTempSchema = NULL,
  cohortId,
  cohortName = cohortId,
  outcomeIds,
  outcomeNames = outcomeIds,
  cohortDatabaseSchema,
  cohortTable = "cohort",
  outcomeDatabaseSchema = cohortDatabaseSchema,
  outcomeTable = cohortTable,
  cdmVersion = 5,
  riskWindowStart = c(1, 1, 1, 1, 1),
  startAnchor = rep("cohort start", 5),
  riskWindowEnd = c(365, 365 * 2, 365 * 3, 365 * 4, 365 * 5),
  endAnchor = rep("cohort start", 5),
  outputFolder = NULL,
  sampleSize = NULL,
  minCellCount = 5
)

Arguments

plpData

The data object to do the diagnostic on - if NULL you need to specify the connection settings below

cdmDatabaseName

Name of the database

connectionDetails

An R object of typeconnectionDetails created using the function createConnectionDetails in the DatabaseConnector package.

cdmDatabaseSchema

The name of the database schema that contains the OMOP CDM instance. Requires read permissions to this database. On SQL Server, this should specifiy both the database and the schema, so for example 'cdm_instance.dbo'.

oracleTempSchema

For Oracle only: the name of the database schema where you want all temporary tables to be managed. Requires create/insert permissions to this database.

cohortId

A unique identifier to define the at risk cohorts. CohortId is used to select the cohort_concept_id in the cohort-like table.

cohortName

A string specifying the name of the target cohort

outcomeIds

A vector of cohort_definition_ids used to define outcomes.

outcomeNames

A vector of names for each outcome.

cohortDatabaseSchema

The name of the database schema that is the location where the cohort data used to define the at risk cohort is available. If cohortTable = DRUG_ERA, cohortDatabaseSchema is not used by assumed to be cdmSchema. Requires read permissions to this database.

cohortTable

The tablename that contains the at risk cohort. If cohortTable <> DRUG_ERA, then expectation is cohortTable has format of COHORT table: cohort_concept_id, SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE.

outcomeDatabaseSchema

The name of the database schema that is the location where the data used to define the outcome cohorts is available. If cohortTable = CONDITION_ERA, exposureDatabaseSchema is not used by assumed to be cdmSchema. Requires read permissions to this database.

outcomeTable

The tablename that contains the outcome cohorts. If outcomeTable <> CONDITION_OCCURRENCE, then expectation is outcomeTable has format of COHORT table: COHORT_DEFINITION_ID, SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE.

cdmVersion

Define the OMOP CDM version used: currently support "4" and "5".

riskWindowStart

The start of the risk window (in days) relative to the startAnchor.

startAnchor

The anchor point for the start of the risk window. Can be "cohort start" or "cohort end".

riskWindowEnd

The end of the risk window (in days) relative to the endAnchor parameter

endAnchor

The anchor point for the end of the risk window. Can be "cohort start" or "cohort end".

outputFolder

Location to save results for shiny app

sampleSize

Sample from the target population

minCellCount

The minimum count that will be displayed

Value

An object containing the model or location where the model is save, the data selection settings, the preprocessing and training settings as well as various performance measures obtained by the model.

distribution

list for each O of a data.frame containing: i) Time to observation end distribution, ii) Time from observation start distribution, iii) Time to event distribution and iv) Time from last prior event to index distribution (only for patients in T who have O before index)

incident

list for each O of incidence of O in T during TAR

characterization

list for each O of Characterization of T, TnO, Tn~O

Details

Users can define set of Ts, Os, databases and population settings. A list of data.frames containing details such as follow-up time distribution, time-to-event information, characteriszation details, time from last prior event, observation time distribution.

Examples

Run this code

# NOT RUN {
#******** EXAMPLE 1 ********* 
# }

Run the code above in your browser using DataLab