Learn R Programming

⚠️There's a newer version (2.5.1) of this package.Take me there.

healthcareai

The aim of healthcareai is to make machine learning easy on healthcare data. The package has two main goals:

  • Allow one to easily develop and compare models based on tabular data, and deploy a best model that pushes predictions to either databases or flat files.

  • Provide tools related to data cleaning, manipulation, and imputation.

For those starting out

  • If you haven't, install R version >= 3.2.3 and RStudio

Note: if you're setting up R on an ETL server, don't download RStudio--simply open up RGui

Install the latest release on Windows

Open RStudio and work in the console

install.packages('healthcareai')

If install.packages('healthcareai') or library(healthcareai) fails

If you don't have admin rights on the machine you are working on, you may need to set a custom location for your R libraries. Here's how to do that:

  1. Create a folder to hold your R packages. You'll generally have write access to your Documents folder, so you might create a new directory: C:\Users\your.name\Documents\R\R_library. Shift-right click on that folder and copy its path.
  2. Define a system variable with that folder location. Open the Control Panel and click through User Accounts -> User Accounts -> Change my environment variables, and add a variable called R_LIBS_USER, and paste the folder path (C:\Users\your.name\Documents\R\R_library) into the value field. Make sure the path is not surrounded by "s.
  3. Tell R to use that location. Restart R Studio, run install.packages('healthcareai'), and if asked whether you want to use a custom library location choose yes, which may be sufficient. If not, click into the Console in R Studio, type .libPaths(), paste the path to your new library folder inside the (), and change the \s to /. You should end up with a line that looks like: .libPaths("C:/Users/your.name/Documents/R/R_library"). Press enter to run that.
  4. Try again. Run install.packages('healthcareai') and library(healthcareai) again and all should be well!

How to install the latest version on macOS

Open RStudio and work in the console

install.packages('healthcareai')

How to install latest version on Ubuntu (Linux)

  • An Ubuntu 14.04 Droplet with at least 1 GB of RAM is required for the installation.
  • Follow steps 1 and 2 here to install R
  • Run sudo apt-get install libiodbc2-dev
  • Run sudo apt-get install unixodbc unixodbc-dev
  • After typing R run install.packages('healthcareai')

Install the bleeding edge version (for folks providing contributions)

Open RStudio and work in the console

library(devtools)
devtools::install_github(repo='HealthCatalyst/healthcareai-r')

Tips on getting started

Built-in examples

Load the package you just installed and read the built-in docs

library(healthcareai)
?healthcareai

Website examples

See our docs website

Join the community

Read the blog and join the slack channel at healthcare.ai

What's new?

The CRAN 1.0.0 release features:

  • Added:
    • Kmeans clustering
    • XGBoost multiclass support
    • findingVariation family of functions
  • Changed:
    • Develop step trains and saves models
    • Deploy no longer trains. Loads and predicts on all rows.
    • SQL uses a DBI back end
  • Removed:
    • testWindowCol is no longer a param.
    • SQL reading/writing is outside model deployment.

For issues

  • Double check that the code follows the examples in the built-in docs
library(healthcareai)
?healthcareai
  • Make sure you've thoroughly read the descriptions found here

  • If you're still seeing an error, file an issue on Stack Overflow using the healthcare-ai tag. Please provide

    • Details on your environment (OS, database type, R vs Py)
    • Goals (ie, what are you trying to accomplish)
    • Crystal clear steps for reproducing the error

Contributing

You want to help? Woohoo! We welcome that and are willing to help newbies get started.

First, see here for instructions on setting up your development environment and how to contribute.

Copy Link

Version

Install

install.packages('healthcareai')

Monthly Downloads

99

Version

1.2.4

License

MIT + file LICENSE

Maintainer

Michael Levy

Last Published

September 5th, 2022

Functions in healthcareai (1.2.4)

calculatePerformance

Generate performance metrics after model has been trained
build_process_variable_df_list

Build a list of dataframes with new predictions for each modifiable variable.
catalyst_test_deploy_in_prod

Test function to check that the production environment is active.
generateAUC

Generate ROC or PR curve for a dataset.
findVariation

Find high variation
calulcateAlternatePredictions

Recalculate predicted value based on alternate scenarios
imputeDF

Perform imputation on a dataframe
distancePointSegment

Compute the distance of a point from a line segment
imputeColumn

Depreciated in favor of imputeDF
calculateConfusion

Generate confusion matrix of percentages
drop_repeated

Simultaneously remove duplicate row values in a list of dataframes.
isNumeric

Check if a data frame only has numeric columns.
getPipedWordCount

Count number of words in pipe-delimited string
build_process_variables_df

Build a the output dataframe for modifiable process variables from a list of dataframes.
dataScale

Center and scale columns in a numeric data frame
calculateAllCorrelations

Correlation analysis on an input table over all numeric columns
distancePointLine

Compute the distance of a point from a line
groupedLOCF

Last observation carried forward
percentDataAvailableInDateRange

Find the percent of a column that's filled
isTargetYN

Tests whether predictedCol is Y/N. Allows for NAs to be present.
plotROCs

Plot ROCs from SupervisedModel classes
calculateTargetedCorrelations

Correlation analysis on an input table, focusing on one target variable
createVarianceTallTable

Transform a dataframe to be three columns and tall instead of wide
findBestAlternateScenarios

Find most biggest drop in predictive probability across alternate features
findElbow

Find the elbow in a curve
countMissingData

Function to find proportion of NAs in each column of a dataframe or matrix
assignClusterLabels

Assign labels to the kmeans confusion matrix
removeColsWithDTSSuffix

Remove columns with DTS suffix
convertDateTimeColToDummies

Converts datetime columns into dummy columns
healthcareai

healthcareai: a streamlined way to develop and deploy models
addSAMUtilityCols

Add SAM utility columns to table
findTrends

Find any columns that have a trend above a particular threshold
createAllCombinations

Find all possible unique combinations
featureAvailabilityProfiler

Calculate and plot data availability over time
calculateSDChanges

Calculate std deviation up/down for each numeric field in row
ignoreSpecWarn

Function to suppress specific warnings in unit tests
countPercentEmpty

DEPRECATED. Calculates percentage of each column in df that is NULL (NA)
writeData

Write data to database
removeColsWithOnlyNA

Remove columns from a data frame that are only NA
removeColsWithAllSameValue

Remove columns from a data frame when those columns have the same values in each row
countDaysSinceFirstDate

Creates column based on days since first date
getCutOffList

Function to return ideal cutoff and TPR/FPR or precision/recall.
removeRowsWithNAInSpecCol

Remove rows where specified col is NA
isBinary

Check if a vector has only two unique values.
initializeParamsForTesting

Function to initialize and populate the SupervisedModelDevelopmentParams each time a unit test is run.
lineMagnitude

Compute the distance between two points
getPipedValue

Grab number after single pipe in pipe-delimited string
plotPRCurve

Plot PR Curves from SupervisedModel classes
nelsonRule1

Analyze points in time to determine whether or not Nelson Rule 1 was violated
plotProfiler

Display availability feature profile over time
orderByDate

Order the rows in a data frame by date
pcaAnalysis

Perform principle component analysis
stop_prod_logs

Stops all console logging.
variationAcrossGroups

Find variation across groups
splitOutDateTimeCols

Splits datetime column into multiple date features
start_prod_logs

Sets console logging to a file in the working directory.
selectData

Pull data into R via an ODBC connection
permute_process_variables

Take a dataframe and build a larger dataframe by permuting the values in certain columns.
returnColsWithMoreThanFiftyCategories

Return vector of columns in a data frame with greater than 50 categories
skip_on_not_appveyor

Function to skip specific tests if they are not being run on Appveyor.
LassoDevelopment

Compare predictive models, created on your data
RandomForestDevelopment

Compare predictive models, created on your data
RiskAdjustedComparisons

Make risk adjusted comparisons between groups/units or years/months
SupervisedModelDeployment

Deploy predictive models, created on your data
LinearMixedModelDevelopment

Compare predictive models, created on your data
SupervisedModelDevelopment

Compare predictive models, created on your data
KmeansClustering

Build clusters using kmeans()
LinearMixedModelDeployment

Deploy a production-ready predictive Linear Mixed Model model
SupervisedModelDevelopmentParams

SupervisedModelDevelopmentParams class to set up parameters required to build SupervisedModel classes
SupervisedModelDeploymentParams

SupervisedModelDeploymentParams class to set up parameters required to build SupervisedModelDeployment class
UnsupervisedModelParams

UnsupervisedModelParams class to set up parameters required to build UnsupervisedModel classes
UnsupervisedModel

Build clusters based on your data.
RandomForestDeployment

Deploy a production-ready predictive RandomForest model
LassoDeployment

Deploy a production-ready predictive Lasso model
build_one_level_df

Replace all value in the column of a dataframe with a given value.
XGBoostDeployment

Deploy a production-ready predictive XGBoost model
calculateHourBins

Calculate a vector of reasonable time bins
XGBoostDevelopment

Compare predictive models, created on your data
calculateCOV

Calculate coefficient of variation