healthcareai v2.3.0


Monthly downloads



Tools for Healthcare Machine Learning

A machine learning toolbox tailored to healthcare data.



Appveyor Build
Status Travis-CI Build
Status codecov
badge CRAN\_Status\_Badge CRAN downloads
badge License:


The aim of healthcareai is to make machine learning in healthcare as easy as possible. It does that by providing functions to:

  • Develop customized, reliable, high-performance machine learning models with minimal code
  • Easily make and evaluate predictions and push them to a database
  • Understand how a model makes its predictions
  • Make data cleaning, manipulation, imputation, and visualization as simple as possible


healthcareai can take you from messy data to an optimized model in one line of code:

models <- machine_learn(pima_diabetes, patient_id, outcome = diabetes)
# > Algorithms Trained: Random Forest, eXtreme Gradient Boosting, and glmnet
# > Model Name: diabetes
# > Target: diabetes
# > Class: Classification
# > Performance Metric: AUROC
# > Number of Observations: 768
# > Number of Features: 12
# > Models Trained: 2018-09-01 18:19:44 
# > 
# > Models tuned via 5-fold cross validation over 10 combinations of hyperparameter values.
# > Best model: Random Forest
# > AUPR = 0.71, AUROC = 0.84
# > Optimal hyperparameter values:
# >   mtry = 2
# >   splitrule = extratrees
# >   min.node.size = 12

Make predictions and examine predictive performance:

predictions <- predict(models, outcome_groups = TRUE)

Learn More

For details on what’s happening under the hood and for options to customize data preparation and model training, see Getting Started with healthcareai as well as the helpfiles for individual functions such as ?machine_learn, ?predict.model_list, and ?explore.

Documentation of all functions as well as vignettes on various uses of the package are available at the package website:

Also, be sure to read our blog and watch our broadcasts to learn more about what’s new in healthcare machine learning and how we are using this toolkit to put machine learning to work in real healthcare systems.

Get Involved

We have a Slack community that is a great place to introduce yourself, share what you’re doing with the package, ask questions, and troubleshoot your code.


If you are interested in contributing the package (great!), please read the contributing guide, and look for issues with the “help wanted” tag. Feel free to tackle any issue that interests you; those are a few issues that we feel would make a good place to start.


Your feedback is hugely appreciated. It is makes the package work well and helps us make it more useful to the community. Both feature requests and bug reports should be submitted as Github issues.

Bug reports should be filed with a minimal reproducable example. The reprex package is extraordinarily helpful for this. Please also include the output of sessionInfo() or better yet, devtools::session_info().


Version 1 of healthcareai has been retired. You can continue to use it, but its compatibility with changes in the R ecosystem are not guaranteed. You should always be able to install it from github with: install.packages("remotes"); remotes::install_github("HealthCatalyst/healthcareai-r@v1.2.4").

For an example of how to adapt v1 models to the v2 API, check out the Transitioning vignettes.

Functions in healthcareai

Name Description
flash_models Train models without tuning for performance
get_supported_models Supported models and their hyperparameters
get_variable_importance Get variable importances
pip Patient Impact Predictor
pivot Pivot multiple rows per observation to one row with multiple columns
get_thresholds Get class-separating thresholds for classification predictions
get_hyperparameter_defaults Get hyperparameter values
plot.predicted_df Plot model predictions vs observed outcomes
hcai_impute Specify imputation methods for an existing recipe
get_cutoffs Get cutoff values for group predictions
plot.thresholds_df Plot threshold performance metrics
healthcareai Machine Learning Made Easy
missingness Find missingness in each column and search for strings that might represent missing values
step_dummy_hcai Dummy Variables Creation
permute_process_variables Take a dataframe and build a larger dataframe by permuting the values in certain columns.
step_locfimpute Last Observation Carried Forward Imputation
impute Impute data and return a reusable recipe
selectData Defunct. See db_read
build_connection_string Build a connection string for use with MSSQL and dbConnect
separate_drgs Convert MSDRGs into a "base DRG" and complication level
interpret Interpret a model via regularized coefficient estimates
split_train_test Split data into training and test data frames
start_prod_logs Defunct
plot.missingness Plot missingness
plot.model_list Plot performance of models
build_one_level_df Replace all value in the column of a dataframe with a given value.
prep_data Prepare data for machine learning
reexports Objects exported from other packages
step_add_levels Add levels to nominal variables
db_read Read from a SQL Server database table
evaluate Get model performance metrics
step_date_hcai Date and Time Feature Generator
is.model_list Type checks
is.predicted_df Class check
pima_diabetes Patient diabetes dataset
pima_meds Patient medications dataset
plot.variable_importance Plot variable importance
predict.model_list Get predictions
summary.missingness Summarizes data given by missingness
tune_models Tune multiple machine learning models using cross validation to optimize performance
convert_date_cols Convert character date columns to dates and times
countMissingData Function to find proportion of NAs in each column of a dataframe or matrix
evaluate_classification Get performance metrics for classification predictions
machine_learn Machine learning made easy
evaluate_multiclass Get performance metrics for multiclass predictions
make_na Replace missingness values with NA and correct columns types
plot.explore_df Plot Counterfactual Predictions
plot.interpret Plot regularized model coefficients
rename_with_counts Adds the category count to each category name in a given variable column
save_models Save models to disk and load models from disk
step_missing Clean NA values from categorical/nominal variables
stop_prod_logs Defunct
writeData Defunct. See this vignette for help writing to databases.
add_best_levels Build efficient features from high-cardinality, multiple-membership factors
Mode Mode
catalyst_test_deploy_in_prod Defunct
control_chart Create a control chart
add_SAM_utility_cols Add SAM utility columns to table
evaluate_regression Get performance metrics for regression predictions
as.model_list Make models into model_list object
explore Explore a model's "reasoning" via counterfactual predictions
No Results!

Last month downloads


Type Package
Date 2018-12-11
License MIT + file LICENSE
LazyData TRUE
RoxygenNote 6.1.1
Encoding UTF-8
NeedsCompilation no
Packaged 2018-12-12 23:01:15 UTC; michael.mastanduno
Repository CRAN
Date/Publication 2018-12-12 23:50:03 UTC

Include our badge in your README