Learn R Programming

⚠️There's a newer version (2.5.1) of this package.Take me there.

healthcareai

Overview

The aim of healthcareai is to make machine learning in healthcare as easy as possible. It does that by providing functions to:

  • Develop customized, reliable, high-performance machine learning models with minimal code
  • Easily make and evaluate predictions and push them to a database
  • Understand how a model makes its predictions
  • Make data cleaning, manipulation, imputation, and visualization as simple as possible

Usage

healthcareai can take you from messy data to an optimized model in one line of code:

models <- machine_learn(pima_diabetes, patient_id, outcome = diabetes)
models
# > Algorithms Trained: Random Forest, eXtreme Gradient Boosting, and glmnet
# > Model Name: diabetes
# > Target: diabetes
# > Class: Classification
# > Performance Metric: AUROC
# > Number of Observations: 768
# > Number of Features: 12
# > Models Trained: 2018-09-01 18:19:44 
# > 
# > Models tuned via 5-fold cross validation over 10 combinations of hyperparameter values.
# > Best model: Random Forest
# > AUPR = 0.71, AUROC = 0.84
# > Optimal hyperparameter values:
# >   mtry = 2
# >   splitrule = extratrees
# >   min.node.size = 12

Make predictions and examine predictive performance:

predictions <- predict(models, outcome_groups = TRUE)
plot(predictions)

Learn More

For details on what’s happening under the hood and for options to customize data preparation and model training, see Getting Started with healthcareai as well as the helpfiles for individual functions such as ?machine_learn, ?predict.model_list, and ?explore.

Documentation of all functions as well as vignettes on various uses of the package are available at the package website: https://docs.healthcare.ai/.

Also, be sure to read our blog and watch our broadcasts to learn more about what’s new in healthcare machine learning and how we are using this toolkit to put machine learning to work in real healthcare systems.

Get Involved

We have a Slack community that is a great place to introduce yourself, share what you’re doing with the package, ask questions, and troubleshoot your code.

Contributing

If you are interested in contributing the package (great!), please read the contributing guide, and look for issues with the “help wanted” tag. Feel free to tackle any issue that interests you; those are a few issues that we feel would make a good place to start.

Feedback

Your feedback is hugely appreciated. It is makes the package work well and helps us make it more useful to the community. Both feature requests and bug reports should be submitted as Github issues.

Bug reports should be filed with a minimal reproducable example. The reprex package is extraordinarily helpful for this. Please also include the output of sessionInfo() or better yet, devtools::session_info().

Legacy

Version 1 of healthcareai has been retired. You can continue to use it, but its compatibility with changes in the R ecosystem are not guaranteed. You should always be able to install it from github with: install.packages("remotes"); remotes::install_github("HealthCatalyst/healthcareai-r@v1.2.4").

For an example of how to adapt v1 models to the v2 API, check out the Transitioning vignettes.

Copy Link

Version

Install

install.packages('healthcareai')

Monthly Downloads

99

Version

2.5.0

License

MIT + file LICENSE

Maintainer

Mike Mastanduno

Last Published

August 5th, 2020

Functions in healthcareai (2.5.0)

as.model_list

Make models into model_list object
add_best_levels

Build efficient features from high-cardinality, multiple-membership factors
control_chart

Create a control chart
convert_date_cols

Convert character date columns to dates and times
countMissingData

Function to find proportion of NAs in each column of a dataframe or matrix
catalyst_test_deploy_in_prod

Defunct
build_one_level_df

Replace all value in the column of a dataframe with a given value.
build_connection_string

Build a connection string for use with MSSQL and dbConnect
evaluate_regression

Get performance metrics for regression predictions
explore

Explore a model's "reasoning" via counterfactual predictions
evaluate_multiclass

Get performance metrics for multiclass predictions
evaluate_classification

Get performance metrics for classification predictions
get_thresholds

Get class-separating thresholds for classification predictions
Mode

Mode
db_read

Read from a SQL Server database table
add_SAM_utility_cols

Add SAM utility columns to table
get_variable_importance

Get variable importances
evaluate

Get model performance metrics
flash_models

Train models without tuning for performance
get_cutoffs

Get cutoff values for group predictions
get_hyperparameter_defaults

Get hyperparameter values
get_supported_models

Supported models and their hyperparameters
plot.explore_df

Plot Counterfactual Predictions
interpret

Interpret a model via regularized coefficient estimates
impute

Impute data and return a reusable recipe
healthcareai

Machine Learning Made Easy
hcai_impute

Specify imputation methods for an existing recipe
machine_learn

Machine learning made easy
missingness

Find missingness in each column and search for strings that might represent missing values
permute_process_variables

Take a dataframe and build a larger dataframe by permuting the values in certain columns.
make_na

Replace missingness values with NA and correct columns types
is.predicted_df

Class check
pima_meds

Patient medications dataset
is.model_list

Type checks
pivot

Pivot multiple rows per observation to one row with multiple columns
pip

Patient Impact Predictor
rename_with_counts

Adds the category count to each category name in a given variable column
save_models

Save models to disk and load models from disk
pima_diabetes

Patient diabetes dataset
plot.model_list

Plot performance of models
plot.missingness

Plot missingness
plot.predicted_df

Plot model predictions vs observed outcomes
predict.model_list

Get predictions
plot.variable_importance

Plot variable importance
prep_data

Prepare data for machine learning
step_dummy_hcai

Dummy Variables Creation
step_locfimpute

Last Observation Carried Forward Imputation
reexports

Objects exported from other packages
separate_drgs

Convert MSDRGs into a "base DRG" and complication level
selectData

stop_prod_logs

Defunct
plot.thresholds_df

Plot threshold performance metrics
step_missing

Clean NA values from categorical/nominal variables
summary.missingness

tune_models

Tune multiple machine learning models using cross validation to optimize performance
split_train_test

Split data into training and test data frames
plot.interpret

Plot regularized model coefficients
step_date_hcai

Date and Time Feature Generator
step_add_levels

Add levels to nominal variables
start_prod_logs

Defunct
writeData

Defunct. See this vignette for help writing to databases.