PatientLevelPrediction (version 4.3.10)

runEnsembleModel: ensemble - Create an ensembling model using different models

Description

#'

Usage

runEnsembleModel(
  population,
  dataList,
  modelList,
  testSplit = "time",
  testFraction = 0.2,
  stackerUseCV = TRUE,
  splitSeed = NULL,
  nfold = 3,
  saveDirectory = NULL,
  saveEnsemble = F,
  savePlpData = F,
  savePlpResult = F,
  savePlpPlots = F,
  saveEvaluation = F,
  analysisId = NULL,
  verbosity = "INFO",
  ensembleStrategy = "mean",
  cores = NULL
)

Arguments

population

The population created using createStudyPopulation() who will be used to develop the model

dataList

An list of object of type plpData - the patient level prediction data extracted from the CDM.

modelList

An list of type of base model created using one of the function in final ensembling model, the base model can be any model implemented in this package.

testSplit

Either 'person' or 'time' specifying the type of evaluation used. 'time' find the date where testFraction of patients had an index after the date and assigns patients with an index prior to this date into the training set and post the date into the test set 'person' splits the data into test (1-testFraction of the data) and train (validationFraction of the data) sets. The split is stratified by the class label.

testFraction

The fraction of the data to be used as the test set in the patient split evaluation.

stackerUseCV

When doing stacking you can either use the train CV predictions to train the stacker (TRUE) or leave 20 percent of the data to train the stacker

splitSeed

The seed used to split the test/train set when using a person type testSplit

nfold

The number of folds used in the cross validation (default 3)

saveDirectory

The path to the directory where the results will be saved (if NULL uses working directory)

saveEnsemble

Binary indicating whether to save the ensemble

savePlpData

Binary indicating whether to save the plpData object (default is F)

savePlpResult

Binary indicating whether to save the object returned by runPlp (default is F)

savePlpPlots

Binary indicating whether to save the performance plots as pdf files (default is F)

saveEvaluation

Binary indicating whether to save the oerformance as csv files (default is T)

analysisId

The analysis ID

verbosity

Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are:

  • DEBUGHighest verbosity showing all debug statements

  • TRACEShowing information about start and end of steps

  • INFOShow informative information (Default)

  • WARNShow warning messages

  • ERRORShow error messages

  • FATALBe silent except for fatal errors

ensembleStrategy

The strategy used for ensembling the outputs from different models, it can be 'mean', 'product', 'weighted' and 'stacked' 'mean' the average probability from differnt models 'product' the product rule 'weighted' the weighted average probability from different models using train AUC as weights. 'stacked' the stakced ensemble trains a logistics regression on different models.

cores

The number of cores to use when training the ensemble

Details

This function applied a list of models and combines them into an ensemble model