Learn R Programming

⚠️There's a newer version (6.4.1) of this package.Take me there.

PatientLevelPrediction

PatientLevelPrediction is part of HADES.

Introduction

PatientLevelPrediction is an R package for building and validating patient-level predictive models using data in the OMOP Common Data Model format.

Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc. 2018;25(8):969-975.

The figure below illustrates the prediction problem we address. Among a population at risk, we aim to predict which patients at a defined moment in time (t = 0) will experience some outcome during a time-at-risk. Prediction is done using only information about the patients in an observation window prior to that moment in time.

To define a prediction problem we have to define t=0 by a Target Cohort (T), the outcome we like to predict by an outcome cohort (O), and the time-at-risk (TAR). Furthermore, we have to make design choices for the model we like to develop, and determine the observational datasets to perform internal and external validation. This conceptual framework works for all type of prediction problems, for example those presented below (T=green, O=red).

Features

  • Takes one or more target cohorts (Ts) and one or more outcome cohorts (Os) and develops and validates models for all T and O combinations.
  • Allows for multiple prediction design options.
  • Extracts the necessary data from a database in OMOP Common Data Model format for multiple covariate settings.
  • Uses a large set of covariates including for example all drugs, diagnoses, procedures, as well as age, comorbidity indexes, and custom covariates.
  • Includes a large number of state-of-the-art machine learning algorithms that can be used to develop predictive models, including Regularized logistic regression, Random forest, Gradient boosting machines, Decision tree, Naive Bayes, K-nearest neighbours, Neural network and Deep learning (Convolutional neural networks, Recurrent neural network and Deep nets).
  • Allows you to add custom algorithms.
  • Contains functionality to externally validate models.
  • Includes functions to plot and explore model performance (ROC + Calibration).
  • Includes a shiny app to interactively view and explore results.
  • Implements existing models.
  • Builds ensemble models.
  • Builds Deep Learning models.
  • Generates learning curves.
  • Automatically creates a word document containing all the study results.

Screenshots

Demo of the Shiny Apps can be found here:

Technology

PatientLevelPrediction is an R package, with some functions implemented in C++ and python.

System Requirements

Requires R (version 3.3.0 or higher). Installation on Windows requires RTools. Libraries used in PatientLevelPrediction require Java and Python.

The python installation is required for some of the machine learning algorithms. We advise to install Python 3.7 using Anaconda (https://www.continuum.io/downloads).

Getting Started

  • To install the package please read the Package Installation guide

  • Have a look at the video below for an extensive demo of the package.

Please read the main vignette for the package:

In addition we have created vignettes that describe advanced functionality in more detail:

Package manual: PatientLevelPrediction.pdf

User Documentation

Documentation can be found on the package website.

PDF versions of the documentation are also available, as mentioned above.

Support

  • Developer questions/comments/feedback: OHDSI Forum
  • We use the GitHub issue tracker for all bugs/issues/enhancements

Contributing

Read here how you can contribute to this package.

License

PatientLevelPrediction is licensed under Apache License 2.0

Development

PatientLevelPrediction is being developed in R Studio.

Beta

Acknowledgements

  • The package is maintained by Jenna Reps and Peter Rijnbeek and has been developed with major contributions from Martijn Schuemie, Patrick Ryan, and Marc Suchard.
  • We like to thank the following persons for their contributions to the package: Seng Chan You, Ross Williams, Henrik John, Xiaoyong Pan, James Wiggins.
  • This project is supported in part through the National Science Foundation grant IIS 1251151.

Copy Link

Version

Install

install.packages('PatientLevelPrediction')

Monthly Downloads

518

Version

4.3.10

License

Apache License 2.0

Issues

Pull Requests

Stars

Forks

Maintainer

Jenna Reps

Last Published

April 20th, 2025

Functions in PatientLevelPrediction (4.3.10)

combinePlpModelSettings

combine two objects specifying multiple Plp model settings
averagePrecision

Calculate the average precision
calibrationLine

calibrationLine
accuracy

Calculate the accuracy
checkPlpInstallation

Check PatientLevelPrediction and its dependencies are correctly installed
PatientLevelPrediction

PatientLevelPrediction
brierScore

brierScore
applyModel

Apply train model on new data Apply a Patient Level Prediction model on Patient Level Prediction Data and get the predicted risk in [0,1] for each person in the population. If the user inputs a population with an outcomeCount column then the function also returns the evaluation of the prediction (AUC, brier score, calibration)
addRecalibration

addRecalibration
applyEnsembleModel

Apply trained ensemble model on new data Apply a Patient Level Prediction model on Patient Level Prediction Data and get the predicted risk in [0,1] for each person in the population. If the user inputs a population with an outcomeCount column then the function also returns the evaluation of the prediction (AUC, brier score, calibration)
computeAuc

Compute the area under the ROC curve
createPlpReport

createPlpReport
createPlpModelSettings

create a an object specifying the multiple Plp model settings
computeAucFromDataFrames

Compute the area under the ROC curve
createStudyPopulationSettings

create the study population settings
createLearningCurvePar

createLearningCurvePar
evaluateMultiplePlp

externally validate the multiple plp models across new datasets
createStudyPopulation

Create a study population
createLearningCurve

createLearningCurve
drawAttritionDiagramPlp

Draw the attrition diagram
createLrSql

Convert logistic regression model to sql code...
f1Score

Calculate the f1Score
createPlpJournalDocument

createPlpJournalDocument
falseDiscoveryRate

Calculate the falseDiscoveryRate
evaluatePlp

evaluatePlp
interpretInstallCode

Tells you the package issue
externalValidatePlp

externalValidatePlp - Validate a model on new databases
launchDiagnosticsExplorer

Launch the Diagnostics Explorer Shiny app
getPredictionDistribution

Calculates the prediction distribution
getThresholdSummary

Calculate all measures for sparse ROC
getCalibration

Get a sparse summary of the calibration
fitPlp

fitPlp
outcomeSurvivalPlot

Plot the outcome incidence over time
personSplitter

Split data into random subsets stratified by class
loadEnsemblePlpResult

loads the Ensemble plp results
negativeLikelihoodRatio

Calculate the negativeLikelihoodRatio
configurePython

Sets up a virtual environment to use for PLP (can be conda or python)
plotF1Measure

Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame
diagnostic

diagnostic - Investigates the prediction problem settings - use before training a model
createEnsemble

Combine models into an Ensemble
falsePositiveRate

Calculate the falsePositiveRate
predictAndromeda

Generated predictions from a regression model
listAppend

join two lists
fitGLMModel

Fit a predictive model
negativePredictiveValue

Calculate the negativePredictiveValue
plotLearningCurve

plotLearningCurve
plotPlp

Plot all the PatientLevelPrediction plots
randomSplitter

Split data into random subsets stratified by class
loadEnsemblePlpModel

loads the Ensmeble plp model and return a model list
getPlpData

Get the patient level prediction data from the server
diagnosticOddsRatio

Calculate the diagnostic odds ratio
plotSparseCalibration

Plot the calibration
loadPlpData

Load the cohort data from a folder
pfi

pfi
falseNegativeRate

Calculate the falseNegativeRate
falseOmissionRate

Calculate the falseOmissionRate
loadPlpFromCsv

Loads parts of the plp result saved as csv files for transparent sharing
plpDataSimulationProfile

A simulation profile
plotGeneralizability

Plot the train/test generalizability diagnostic
getPlpTable

Create a dataframe with the summary details of the population cohort for publications
plotPredictedPDF

Plot the Predicted probability density function, showing prediction overlap between true and false cases
positiveLikelihoodRatio

Calculate the positiveLikelihoodRatio
plotRoc

Plot the ROC curve
plotSparseCalibration2

Plot the conventional calibration
runPlpAnalyses

Run a list of predictions
setCovNN2

Create setting for CovNN2 model - convolution across input and time - https://arxiv.org/pdf/1608.00647.pdf
setLassoLogisticRegression

Create setting for lasso logistic regression
runPlp

runPlp - Train and evaluate the model
recalibratePlp

recalibratePlp
setAdaBoost

Create setting for AdaBoost with python
loadPlpResult

Loads the evalaution dataframe
loadPrediction

Loads the prediciton dataframe to csv
savePlpToCsv

Save parts of the plp result as a csv for transparent sharing
loadPlpModel

loads the plp model
setCoxModel

Create setting for lasso Cox model
setMLPTorch

Create setting for neural network model with python
setNaiveBayes

Create setting for naive bayes model with python
setRandomForestQuantileRegressor

Create setting for RandomForestQuantileRegressor with python scikit-garden (skgarden.quantile.RandomForestQuantileRegressor) #' @description This creates a setting for fitting a RandomForestQuantileRegressor model. You need skgarden python install. To install this open your command line and type: conda install -c conda-forge scikit-garden
setRandomForest

Create setting for random forest model with python (very fast)
savePredictionAnalysisList

Saves a json prediction settings given R settings
simulatePlpData

Generate simulated data
savePrediction

Saves the prediction dataframe to RDS
plotSmoothCalibration

Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models was defined: from utopia to empirical data" (2016)
timeSplitter

Split test/train data by time and then partitions training set into random folds stratified by class
similarPlpData

Extract new plpData using plpModel settings use metadata in plpModel to extract similar data and population for new databases:
setMLP

Create setting for neural network model with python
positivePredictiveValue

Calculate the positivePredictiveValue
toSparseM

Convert the plpData in COO format into a sparse R matrix
setCompetingRiskModel

Create setting for competing risk model (uses Fine-Gray model in Cyclops)
recalibratePlpRefit

recalibratePlpRefit
setGBMSurvival

Create setting for GBM Survival with python #' @description This creates a setting for fitting GBM surivial model. You need sksurv python install. To install this open your command line and type: conda install -c sebp scikit-survival
runEnsembleModel

ensemble - Create an ensembling model using different models
setGradientBoostingMachine

Create setting for gradient boosting machine model using gbm_xgboost implementation
setSagemakerBinary

Create setting for sagemaker model
transferLearning

[Under development] Transfer learning
setSVM

Create setting for SVM with python
toSparseTorchPython

Convert the plpData in COO format into a sparse python matrix using torch.sparse
saveEnsemblePlpResult

saves the Ensemble plp results
predictPlp

predictPlp
predictProbabilities

Create predictive probabilities
loadPredictionAnalysisList

Load the multiple prediction json settings from a file
setCIReNN

Create setting for CIReNN model
specificity

Calculate the specificity
subjectSplitter

Split data when patients are in the data multiple times such that the same patient is always either in the train set or the test set (the same patient cannot be in both the test and train set at different times)
saveEnsemblePlpModel

saves the Ensmeble plp model
setCNNTorch

Create setting for CNN model with python
modelBasedConcordance

Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/
plotPrecisionRecall

Plot the precision-recall curve using the sparse thresholdSummary data frame
plotSparseRoc

Plot the ROC curve using the sparse thresholdSummary data frame
plotVariableScatterplot

Plot the variable importance scatterplot
plotDemographicSummary

Plot the Observed vs. expected incidence, by age and gender
setLRTorch

Create setting for logistics regression model with python
plotPredictionDistribution

Plot the side-by-side boxplots of prediction distribution, by class#'
setDecisionTree

Create setting for DecisionTree with python
savePlpResult

Saves the result from runPlp into the location directory
setDeepNN

Create setting for DeepNN model
setCovNN

Create setting for multi-resolution CovNN model (stucture based on https://arxiv.org/pdf/1608.00647.pdf CNN1)
viewMultiplePlp

open a local shiny app for viewing the result of a multiple PLP analyses
plotPreferencePDF

Plot the preference score probability density function, showing prediction overlap between true and false cases #'
viewPlp

viewPlp - Interactively view the performance and model settings
savePlpModel

Saves the plp model
savePlpData

Save the cohort data to folder
setPythonEnvironment

Use the virtual environment created using configurePython()
sensitivity

Calculate the sensitivity
transportModel

Transports a plpModel to a new location and removes sensitive data
setKNN

Create setting for knn model
setRNNTorch

Create setting for RNN model with python
transportPlp

Transports a plpResult to a new location and removed sensitive data