Learn R Programming

⚠️There's a newer version (6.4.1) of this package.Take me there.

PatientLevelPrediction

An R package for building and validating patient-level predictive models using data in the OMOP Common Data Model format.

Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc. 2018;25(8):969-975.

Introduction

The figure below illustrates the prediction problem we address. Among a population at risk, we aim to predict which patients at a defined moment in time (t = 0) will experience some outcome during a time-at-risk. Prediction is done using only information about the patients in an observation window prior to that moment in time.

To define a prediction problem we have to define t=0 by a Target Cohort (T), the outcome we like to predict by an outcome cohort (O), and the time-at-risk (TAR). Furthermore, we have to make design choices for the model we like to develop, and determine the observational datasets to perform internal and external validation. This conceptual framework works for all type of prediction problems, for example those presented below (T=green, O=red).

Features

  • Takes one or more target cohorts (Ts) and one or more outcome cohorts (Os) and develops and validates models for all T and O combinations.
  • Allows for multiple prediction design options.
  • Extracts the necessary data from a database in OMOP Common Data Model format for multiple covariate settings.
  • Uses a large set of covariates including for example all drugs, diagnoses, procedures, as well as age, comorbidity indexes, and custom covariates.
  • Includes a large number of state-of-the-art machine learning algorithms that can be used to develop predictive models, including Regularized logistic regression, Random forest, Gradient boosting machines, Decision tree, Naive Bayes, K-nearest neighbours, Neural network and Deep learning (Convolutional neural networks, Recurrent neural network and Deep nets).
  • Allows you to add custom algorithms.
  • Contains functionality to externally validate models.
  • Includes functions to plot and explore model performance (ROC + Calibration).
  • Includes a shiny app to interactively view and explore results.
  • Implements existing models.
  • Builds ensemble models.
  • Builds Deep Learning models.
  • Generates learning curves.
  • Automatically creates a word document containing all the study results.

Screenshots

A demo of the Shiny App can be found here:

Prediction Viewer Shiny App

Technology

PatientLevelPrediction is an R package, with some functions implemented in C++ and python.

System Requirements

Requires R (version 3.3.0 or higher). Installation on Windows requires RTools. Libraries used in PatientLevelPrediction require Java and Python.

The python installation is required for some of the machine learning algorithms. We advise to install Python 3.6 using Anaconda (https://www.continuum.io/downloads) when using Windows operating system or Python 3.6 (https://www.python.org/downloads/release/python-360/) when using Linux or a Mac.

Dependencies

  • Cyclops
  • DatabaseConnector
  • SqlRender
  • FeatureExtraction
  • BigKnn

Getting Started

  • To install the package please read the Package Installation guide

  • Have a look at the video below for an extensive demo of the package.

Please read the main vignette for the package:

In addition we have created vignettes that describe advanced functionality in more detail:

Package manual: PatientLevelPrediction.pdf

Getting Involved

We like to get involved in the development of this package through pull requests to our development branch.

  • Developer questions/comments/feedback: OHDSI Forum
  • We use the GitHub issue tracker for all bugs/issues/enhancements

License

PatientLevelPrediction is licensed under Apache License 2.0

Development

PatientLevelPrediction is being developed in R Studio.

Development status

Beta

Acknowledgements

  • The package is maintained by Jenna Reps and Peter Rijnbeek and has been developed with major contributions from Martijn Schuemie, Patrick Ryan, and Marc Suchard.
  • We like to thank the following persons for their contributions to the package: Seng Chan You, Ross Williams, Henrik John, Xiaoyong Pan, James Wiggins.
  • This project is supported in part through the National Science Foundation grant IIS 1251151.

Copy Link

Version

Install

install.packages('PatientLevelPrediction')

Monthly Downloads

518

Version

3.0.0

License

Apache License 2.0

Maintainer

Jenna Reps

Last Published

April 20th, 2025

Functions in PatientLevelPrediction (3.0.0)

diagnosticOddsRatio

Calculate the diagnostic odds ratio
drawAttritionDiagramPlp

Draw the attrition diagram
createStudyPopulationSettings

create the study population settings
createLearningCurvePar

createLearningCurvePar
externalValidatePlp

externalValidatePlp - Validate a model on new databases
exportPlpResult

exportPlpResult exports an object returned by runPlp into a network study package while removing sensitive information from the object
createStudyPopulation

Create a study population
f1Score

Calculate the f1Score
falseDiscoveryRate

Calculate the falseDiscoveryRate
createLearningCurve

createLearningCurve
clearffTempDir

clearffTempDir
getPlpTable

Create a dataframe with the summary details of the population cohort for publications
createCohort

createCohort - Loads all the cohort sql in a network study and creates the cohorts
calibrationLine

calibrationLine
interpretInstallCode

Tells you the package issue
insertDbPopulation

Insert a population into a database
createLrSql

Convert logistic regression model to sql code...
personSplitter

Split data into random subsets stratified by class
packageResults

Package the results for sharing with OHDSI researchers
getPredictionDistribution

Calculates the prediction distribution
negativePredictiveValue

Calculate the negativePredictiveValue
createPlpJournalDocument

createPlpJournalDocument
createExistingModelSql

Apply an existing logistic regression prediction model
fitGLMModel

Fit a predictive model
falsePositiveRate

Calculate the falsePositiveRate
loadPlpData

Load the cohort data from a folder
combinePlpModelSettings

combine two objects specifying multiple Plp model settings
createPlpModelSettings

create a an object specifying the multiple Plp model settings
createPlpReport

createPlpReport
negativeLikelihoodRatio

Calculate the negativeLikelihoodRatio
loadPredictionAnalysisList

Load the multiple prediction json settings from a file
getAttritionTable

Get the attrition table for a population
evaluateExistingModel

evaluateExistingModel
evaluateMultiplePlp

externally validate the multiple plp models across new datasets
fitPlp

fitPlp
plotDemographicSummary

Plot the Observed vs. expected incidence, by age and gender
getModelDetails

Get the predictive model details
computeAucFromDataFrames

Compute the area under the ROC curve
computeAuc

Compute the area under the ROC curve
exportPlpDataToCsv

Export all data in a plpData object to CSV files
evaluatePlp

evaluatePlp
falseNegativeRate

Calculate the falseNegativeRate
getCovariateData

Get the covaridate data for a cohort table
getCalibration

Get a sparse summary of the calibration
falseOmissionRate

Calculate the falseOmissionRate
getPlpData

Get the patient level prediction data from the server
plotRoc

Plot the ROC curve
getThresholdSummary

Calculate all measures for sparse ROC
plotSparseCalibration2

Plot the conventional calibration
plotSparseCalibration

Plot the calibration
loadPlpResult

Loads the evalaution dataframe
loadPrediction

Loads the prediciton dataframe to csv
plotSmoothCalibration

Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)
predictProbabilities

Create predictive probabilities
predictPlp

predictPlp
plotLearningCurve

plotLearningCurve
plotPredictionDistribution

Plot the side-by-side boxplots of prediction distribution, by class#'
loadPlpModel

loads the plp model
grepCovariateNames

Extract covariate names
loadEnsemblePlpModel

loads the Ensmeble plp model and return a model list
plotPreferencePDF

Plot the preference score probability density function, showing prediction overlap between true and false cases #'
saveEnsemblePlpResult

saves the Ensemble plp results
positivePredictiveValue

Calculate the positivePredictiveValue
plotPlp

Plot all the PatientLevelPrediction plots
positiveLikelihoodRatio

Calculate the positiveLikelihoodRatio
setCIReNN

Create setting for CIReNN model
savePlpData

Save the cohort data to folder
plotPrecisionRecall

Plot the precision-recall curve using the sparse thresholdSummary data frame
plotPredictedPDF

Plot the Predicted probability density function, showing prediction overlap between true and false cases
registerParallelBackend

registerParallelBackend
plpDataSimulationProfile

A simulation profile
registerSequentialBackend

registerSequentialBackend
plotF1Measure

Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame
loadEnsemblePlpResult

loads the Ensemble plp results
plotSparseRoc

Plot the ROC curve using the sparse thresholdSummary data frame
setCNNTorch

Create setting for CNN model with python
savePlpResult

Saves the result from runPlp into the location directory
setMLP

Create setting for neural network model with python
setMLPTorch

Create setting for neural network model with python
savePlpModel

Saves the plp model
toPlpData

Convert matrix into plpData
setNaiveBayes

Create setting for naive bayes model with python
timeSplitter

Split test/train data by time and then partitions training set into random folds stratified by class
plotVariableScatterplot

Plot the variable importance scatterplot
setRNNTorch

Create setting for RNN model with python
runPlpAnalyses

Run a list of predictions
plotGeneralizability

Plot the train/test generalizability diagnostic
predictFfdf

Generated predictions from a regression model
saveEnsemblePlpModel

saves the Ensmeble plp model
toSparseTorchPython

Convert the plpData in COO format into a sparse python matrix using torch.sparse
sensitivity

Calculate the sensitivity
setCovNN

Create setting for multi-resolution CovNN model (stucture based on https://arxiv.org/pdf/1608.00647.pdf CNN1)
transportModel

Transports a plpModel to a new location and removes sensitive data
setCovNN2

Create setting for CovNN2 model - convolution across input and time - https://arxiv.org/pdf/1608.00647.pdf
savePrediction

Saves the prediction dataframe to csv
setAdaBoost

Create setting for AdaBoost with python
setLRTorch

Create setting for logistics regression model with python
setGradientBoostingMachine

Create setting for gradient boosting machine model using gbm_xgboost implementation
savePredictionAnalysisList

Saves a json prediction settings given R settings
simulatePlpData

Generate simulated data
setLassoLogisticRegression

Create setting for lasso logistic regression
setKNN

Create setting for knn model
submitResults

submitResults - sends a zipped folder to the OHDSI network study repository
viewPlp

viewPlp - Interactively view the performance and model settings
standardOutput

standardOutput - takes the output of runPlp or evaluatePlp and converts it into the standardised output for a network study - three directories (plots, results, summary)
transportPlp

Transports a plpResult to a new location and removed sensitive data
specificity

Calculate the specificity
runPlp

runPlp - Train and evaluate the model
runEnsembleModel

ensemble - Create an ensembling model using different models
setDecisionTree

Create setting for DecisionTree with python
setDeepNN

Create setting for DeepNN model
setRandomForest

Create setting for random forest model with python (very fast)
similarPlpData

Extract new plpData using plpModel settings use metadata in plpModel to extract similar data and population for new databases:
toSparseM

Convert the plpData in COO format into a sparse R matrix
toSparsePython

Convert the plpData in COO format into a sparse python matrix
bySumFf

Compute sum of values binned by a second variable
applyEnsembleModel

Apply trained ensemble model on new data Apply a Patient Level Prediction model on Patient Level Prediction Data and get the predicted risk in [0,1] for each person in the population. If the user inputs a population with an outcomeCount column then the function also returns the evaluation of the prediction (AUC, brier score, calibration)
brierScore

brierScore
PatientLevelPrediction

PatientLevelPrediction
checkPlpInstallation

Check PatientLevelPrediction and its dependencies are correctly installed
checkffFolder

Check if the fftempdir is writable
applyModel

Apply train model on new data Apply a Patient Level Prediction model on Patient Level Prediction Data and get the predicted risk in [0,1] for each person in the population. If the user inputs a population with an outcomeCount column then the function also returns the evaluation of the prediction (AUC, brier score, calibration)
averagePrecision

Calculate the average precision
accuracy

Calculate the accuracy