Learn R Programming

⚠️There's a newer version (0.11.0) of this package.Take me there.

How to Install the Package for R:

1. First, run the following R script to download dependencies

library(devtools)
to_install <- c("arules", "catboost", "caTools", "data.table", "doParallel", 
                "foreach", "forecast", "ggplot2", "h2o", "itertools", 
                "lubridate", "magick", "Matrix", "monreg", "nortest","pROC", "RColorBrewer", "recommenderlab", 
                "ROCR", "scatterplot3d", "stringr", "sde", "tm", "tsoutliers", "wordcloud", "xgboost", "zoo")
for (i in to_install) {
  message(paste("looking for ", i))
  if(i == "catboost" & !requireNamespace(i)) {
    devtools::install_github('catboost/catboost', subdir = 'catboost/R-package')
  } else if(i == "h2o" & !requireNamespace(i)) {
    if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
    if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
    pkgs <- c("RCurl","jsonlite")
    for (pkg in pkgs) {
      if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
    }
    install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/latest_stable_R")))
  } else if (!requireNamespace(i)) {
    message(paste("     installing", i))
    install.packages(i)
  }
}

2. Next, Install RemixAutoML package from GitHub

# Install via:
devtools::install_github('AdrianAntico/RemixAutoML', upgrade = FALSE, dependencies = FALSE, force = TRUE)

3. If you're having trouble installing, see if this issue helps you out.

RemixAutoML

This is a collection of functions that I have made to speed up machine learning and to ensure high quality modeling results and output are generated. They are great at establishing solid baselines that are extremely challenging to beat using alternative methods (if at all). To see them in action, check out the free tutorials at RemyxCourses.com.

Also, be sure to visit our blog at RemixInstitute.ai for data science, machine learning, and AI content.

You can contact me via LinkedIn for any questions about the package. You can also go into the vignettes folder to see the package reference manual and a vignette with some background and examples. If you want to be a contributer, contact me via LinkedIn email.

Hex sticker rendered via the hexSticker package in R: https://github.com/GuangchuangYu/hexSticker

Supervised Learning Training Functions:

Regression:


AutoCatBoostRegression() GPU + CPU

AutoCatBoostRegression() is an automated modeling function that runs a variety of steps. First, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation boxplot, evaluation metrics, variable importance, partial dependence calibration plots, partial dependence calibration box plots, and column names used in model fitting.

AutoXGBoostRegression() GPU + CPU

AutoXGBoostRegression() is an automated XGBoost modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation boxplot, evaluation metrics, variable importance, partial dependence calibration plots, partial dependence calibration box plots, and column names used in model fitting.

AutoH2oGBMRegression()

AutoH2oGBMRegression() is an automated H2O modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation boxplot, evaluation metrics, variable importance, partial dependence calibration plots, partial dependence calibration box plots, and column names used in model fitting.

AutoH2oDRFRegression()

AutoH2oDRFRegression() is an automated H2O modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation boxplot, evaluation metrics, variable importance, partial dependence calibration plots, partial dependence calibration box plots, and column names used in model fitting.

Binary Classification:


AutoCatBoostClassifier() GPU + CPU

AutoCatBoostClassifier() is an automated modeling function that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, ROC plot, evaluation plot, evaluation metrics, variable importance, partial dependence calibration plots, partial dependence calibration box plots, and column names used in model fitting.

AutoXGBoostClassifier() GPU + CPU

AutoXGBoostClassifier() is an automated XGBoost modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation boxplot, evaluation metrics, variable importance, partial dependence calibration plots, partial dependence calibration box plots, and column names used in model fitting.

AutoH2oGBMClassifier()

AutoH2oGBMClassifier() is an automated H2O modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation metrics, variable importance, partial dependence calibration plots, and column names used in model fitting.

AutoH2oDRFClassifier()

AutoH2oDRFClassifier() is an automated H2O modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation plot, evaluation metrics, variable importance, partial dependence calibration plots, and column names used in model fitting.

Multinomial Classification:


AutoCatBoostMultiClass() GPU + CPU

AutoCatBoostMultiClass() is an automated modeling function that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation metrics, variable importance, and column names used in model fitting.

AutoXGBoostMultiClass() GPU + CPU

AutoXGBoostMultiClass() is an automated XGBoost modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation metrics, variable importance, and column names used in model fitting.

AutoH2oGBMMultiClass()

AutoH2oGBMMultiClass() is an automated H2O modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation metrics, confusion matrix, and variable importance.

AutoH2oDRFMultiClass()

AutoH2oDRFMultiClass() is an automated H2O modeling framework with grid-tuning and model evaluation that runs a variety of steps. First, a stratified sampling (by the target variable) is done to create train and validation sets. Then, the function will run a random grid tune over N number of models and find which model is the best (a default model is always included in that set). Once the model is identified and built, several other outputs are generated: validation data with predictions, evaluation metrics, confusion matrix, and variable importance.

Generalized Hurdle Models:


First step is to build either a binary classification model (in the case of a single bucket value, such as zero) or a multiclass model (for the case of multiple bucket values, such as zero and 10). The next step is to subset the data for the cases of: less than the first split value, in between the first and second split value, second and third split value, ..., second to last and last split value, along with greater than last split value. For each data subset, a regression model is built for predicting values in the split value ranges. The final compilation is to multiply the probabilities of being in each group times the values supplied by the regression values for each group.

Single Partition
  • Pr(X = 0) * 0 + Pr(X > 0) * E(X | X >= 0)
  • Pr(X < x1) * E(X | X < x1) + Pr(X >= x1) * E(X | X >= x1)
Multiple Partitions
  • Pr(X = 0) * 0 + Pr(X < x2) * E(X | X < x2) + ... + Pr(X < xn) * E(X | X < xn) + Pr(X >= xn) * E(X | X >= xn)
  • Pr(X < x1) * E(X | X < x1) + Pr(x1 <= X < x2) * E(X | x1 <= X < x2) + ... + Pr(xn-1 <= X < xn) * E(X | xn-1 <= X < xn) + Pr(X >= xn) * E(X | X >= xn)
AutoCatBoostHurdleModel()

AutoCatBoostHurdleModel() utilizes the CatBoost algorithm on the backend.

AutoXGBoostHurdleModel()

AutoXGBoostHurdleModel() utilizes the XGBoost algorithm on the backend.

General Purpose H2O Automated Modeling:


AutoH2OModeler()

AutoH2OModeler() automatically build any number of models along with generating partial dependence calibration plots, model evaluation calibration plots, grid tuning, and file storage for easy production implementation. Handles regression, quantile regression, time until event, and classification models (binary and multinomial) using numeric and factor variables without the need for monotonic transformations nor one-hot-encoding.

  • Models include:
    • RandomForest (DRF)
    • GBM
    • Deeplearning
    • XGBoost (for Linux)
    • LightGBM (for Linux)
    • AutoML - medium debth grid tuning for Deeplearning, XGBoost (if available), DRF, GBM, GLM, and StackedEnsembles

Nonlinear Regression Modeling:


AutoNLS()

AutoNLS() is an automated nonlinear regression modeling. This function automatically finds the best model fit from the suite of models below and merges predictions to source data file. Great for forecasting growth over time or estimating single variable nonlinear functions.

  • Models included:
    • Asymptotic
    • Asymptotic through origin
    • Asymptotic with offset
    • Bi-exponential
    • Four parameter logistic
    • Three parameter logistic
    • Gompertz
    • Michal Menton
    • Weibull
    • Polynomial regression or monotonic regression

Model Scoring Functions:

AutoCatBoostScoring()

AutoCatBoostScoring() is an automated scoring function that compliments the AutoCatBoost() model training functions. This function requires you to supply features for scoring. It will run ModelDataPrep() to prepare your features for catboost data conversion and scoring. It will also handle and transformations and back-transformations if you utilized that feature in the regression training case.

AutoXGBoostScoring()

AutoXGBoostScoring() is an automated scoring function that compliments the AutoXGBoost() model training functions. This function requires you to supply features for scoring. It will run ModelDataPrep() and the DummifyDT() functions to prepare your features for xgboost data conversion and scoring. It will also handle and transformations and back-transformations if you utilized that feature in the regression training case.

AutoH2OMLScoring()

AutoH2OMLScoring() is an automated scoring function that compliments the AutoH2oGBM__() and AutoH2oDRF__() models training functions. This function requires you to supply features for scoring. It will run ModelDataPrep()to prepare your features for H2O data conversion and scoring. It will also handle and transformations and back-transformations if you utilized that feature in the regression training case.

AutoH2OScoring()

AutoH2OScoring() is for scoring models that were built with the AutoH2OModeler, AutoKMeans, and AutoWord2VecModeler functions. Scores mojo models or binary files by loading models into the H2O environment and scoring them. You can choose which output you wish to keep as well for classification and multinomial models.

Time Series Modeling Functions:

AutoTS()

AutoTS() is an automated time series modeling function. The function automatically finds the most accurate time series model from the list of models below by utilizing optimal BoxCox transformations along with a stepwise procedue to test out possible values for lags and moving averages (user specifies upper bounds for lags and moving averages). All model parameters are optimally set to get the best possible performance out of each distinct model. There are also four different versions for each model that can be tested and internally compared by setting ModelFreq = TRUE and setting TSClean = TRUE, resulting in four tested combinations:

  • user-specified time frequency + no historical series smoothing and imputation
  • model-based identified time frequency + no historical smoothing and imputation
  • user-specified time frequency + historical series smoothing and imputation
  • model-based identified time frequency + historical smoothing and imputation

The best model is chosen by looking at the lowest out-of-sample error (user sets the number of periods for testing along with the evaluation metric for evaluation), the winning model is rebuilt on all available data which is then used to generate the forecasts. The output from AutoTS() includes the forecast values, model evaluation metrics and metadata for all models tested, along with the model object.

  • Automated Time Series Models include:
    • DSHW: Double Seasonal Holt-Winters
    • ARFIMA: Auto Regressive Fractional Integrated Moving Average
    • ARIMA: Auto Regressive Integrated Moving Average with specified max lags, seasonal lags, moving averages, and seasonal moving averages
    • ETS: Additive and Multiplicative Exponential Smoothing and Holt-Winters
    • NNetar: Auto Regressive Neural Network models automatically compares models with 1 lag or 1 seasonal lag compared to models with up to N lags and N seasonal lags
    • TBATS: Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components
    • TSLM: Time Series Linear Model - builds a linear model with trend and season components extracted from the data
AutoCatBoostCARMA()

AutoCatBoostCARMA() is an automated machine learning time series forecasting function. The CARMA part of the name refers to Calendar and Auto-Regressive Moving-Average. Create hundreds of thousands of time series forecasts using this function. Internally, it utilizes the catboost algorithm and replicates an ARMA process. The features automatically created internally include calendar variables, lags, moving averages, and a time trend variable. The forecasts are generated by predicting one step ahead at a time and between forecasting steps the model features are updated before generating the next forecast. This process is done for every time step you wish to have forecasted. On top of that, you can automatically have an optimal transformation made on your target variable, with competing transformations being: YeoJohnson, BoxCox, arcsinh, along with arcsin(sqrt(x)) and logit for proportion data. Grid tuning is available along with several other arguments to customize your model builds. You can also utilize GPU if you have one. Running with GPU typically allows for a 10x speedup over CPU with the catboost algorithm. Note, this is based on utilizing a 1080ti.

AutoXGBoostCARMA()

AutoXGBoostCARMA() operates identically to the AutoCatBoostCARMA() function except that is utilizes XGBoost instead of CatBoost.

AutoH2oDRFCARMA()

AutoH2oDRFCARMA() operates identically to the AutoCatBoostCARMA() function except that is utilizes H2O Distributed Random Forest instead of CatBoost

AutoH2oGBMCARMA()

AutoH2oGBMCARMA() operates identically to the AutoCatBoostCARMA() function except that is utilizes H2O GBM instead of CatBoost

Recommender System Functions:

AutoRecomDataCreate()

AutoRecomDataCreate() automatically creates your binary ratings matix from transaction data

AutoRecommender()

AutoRecommender() automated collaborative filtering modeling where each model below competes against one another for top performance

  • RandomItems
  • PopularItems
  • UserBasedCF
  • ItemBasedCF
  • AssociationRules
AutoRecommenderScoring()

AutoRecommenderScoring() automatically score a recommender model from AutoRecommender()

AutoMarketBasketModel()

AutoMarketBasketModel() is a function that runs a market basket analysis automatically. It will convert your data, run the algorithm, and add on additional significance values not provided by the source pacakge.

Unsupervised Learning Functions:

GenTSAnomVars()

GenTSAnomVars() generates time series anomaly variables. (Cross with Feature Engineering) Create indicator variables (high, low) along with cumulative anomaly rates (high, low) based on control limits methodology over a max of two grouping variables and a date variable (effectively a rolling GLM).

ResidualOutliers()

ResidualOutliers() Generate residual outliers from time series modeling. (Cross with Feature Engineering) Utilize tsoutliers to indicate outliers within a time series data set

AutoKMeans()

AutoKMeans() This function builds a generalized low rank model followed by KMeans. (Possible cross with Feature Engineering) Generate a column with a cluster identifier based on a grid tuned (optional) generalized low rank model and a grid tuned (optimal) K-Optimal searching K-Means algorithm

Feature Engineering Functions:

DT_GDL_Feature_Engineering()

DT_GDL_Feature_Engineering() builds autoregressive and moving average features from target columns and distributed lags and distributed moving average from independent features distributed across time. On top of that, you can also create time between instances along with their associated lags and moving averages. This function works for data with groups and without groups. 100% data.table built. It runs super fast and can handle big data.

Partial_DT_GDL_Feature_Engineering()

Partial_DT_GDL_Feature_Engineering() is for generating the equivalent features built from DT_GDL_Feature_Engineering() for a set of new records as rapidly as possible. I used this to create the feature vectors for scoring models in production. This function is for generating lags and moving averages (along with lags and moving averages off of time between records), for a partial set of records in your data set, typical new records that become available for model scoring. Column names and ordering will be identical to the output from the corresponding DT_GDL_Feature_Engineering() function, which most likely was used to create features for model training.

Partial_DT_GDL_Feature_Engineering2()

Partial_DT_GDL_Feature_Engineering2() is another way to compute the same features for a partial set of records as the Partial_DT_GDL_Feature_Engineering() function. This version can run quicker for data sets where moving average features have long windows and the lag list is short. You can benchmark both the original and this version to see which one runs faster for your data.

Scoring_GDL_Feature_Engineering()

Scoring_GDL_Feature_Engineering() is a function that runs internally inside the CARMA functions but might have use outside of it. It is for scoring a single record, for no grouping variables, or one record per group level when a single group is utilized. Generates identical column names as the DT_GDL_Feature_Engineering() function and the Partial_GDL_Feature_Engineering() function.

AutoWord2VecModeler()

AutoWord2VecModeler() generates a specified number of vectors for each column of text data in your data set and save the models for re-creating them later in the scoring process. You can choose to build individual models for each columns or one model for all your columns.

ModelDataPrep()

ModelDataPrep() rapidly convert "inf" values to NA, convert character columns to factor columns, and impute with specified values for factor and numeric columns.

DummifyDT()

DummifyDT() rapidly dichotomizes a list of columns in a data table (N+1 columns for N levels using one hot encoding or N columns for N levels otherwise). Several other arguments exist for outputting and saving factor levels for model scoring processes, which are used internally in the AutoXGBoost__() suite of modeling functions.

AutoDataPartition()

AutoDataPartition() is designed to achieve a few things that standard data partitioning processes or functions don't handle. First, you can choose to build any number of partitioned data sets beyond the standard train, validate, and test data sets. Second, you can choose between random sampling to split your data or you can choose a time-based partitioning. Third, for the random partitioning, you can specify stratification columns in your data to stratify by in order to ensure a proper split amongst your categorical features (E.g. think MultiClass targets). Lastly, it's 100% data.table so it will run fast and with low memory overhead.

AutoTransformationCreate()

AutoTransformationCreate() is a function for automatically identifying the optimal transformations for numeric features and transforming them once identified. This function will loop through your selected transformation options (YeoJohnson, BoxCox, Asinh, Asin, and Logit) and find the one that produces data that is the closest to normally distributed data. It then makes the transformation and collects the metadata information for use in the AutoTransformationScore() function, either by returning the objects (always) or saving them to file (optional).

AutoTransformationScore()

AutoTransformationScore() is a the compliment function to AutoTransformationCreate(). Automatically apply or inverse the transformations you identified in AutoTransformationCreate() to other data sets. This is useful for applying transformations to your validation and test data sets for modeling. It's also useful for back-transforming your target and prediction columns after you have build and score your models so you can obtain statistics on the original features.

GDL_Feature_Engineering()

GDL_Feature_Engineering() builds autoregressive and rolling stats from target columns and distributed lags and distributed rolling stats for independent features distributed across time. On top of that, you can also create time between instances along with their associated lags and rolling stats. This function works for data with groups and without groups. The rolling stats can be of any variety, such as rolling standard deviations, rolling quantiles, etc. but the function runs much slower than the DT_GDL_Feature_Engineering() counterpart so it might not be a good choice for scoring environments that require low latency.

Model Evaluation, Interpretation, and Cost-Sensitive Functions:

ParDepCalPlots()

ParDepCalPlots() is for visualizing the relationships of features and the reliability of the model in predicting those effects. Build a partial dependence calibration line plot, box plot or bar plot for the case of categorical variables.

EvalPlot()

EvalPlot() Has two plot versions: calibration line plot of predicted values and actual values across range of predicted value, and calibration boxplot for seeing the accuracy and variability of predictions against actuals.

threshOptim()

threshOptim() is great for situations with asymmetric costs across the confusion matrix. Generate a cost-sensitive optimized threshold for classification models. Just supply the costs for false positives and false negatives (can supply costs for all four outcomes too) and the function will return the optimal threshold for maximizing "utility".

RedYellowGreen()

RedYellowGreen() computes optimal thresholds for binary classification models where "don't classify" is an option. Consider a health care binary classification model that predicts whether or not a disease is present. This is certainly a case for threshOptim since the costs of false positives and false negatives can vary by a large margin. However, there is always the potential to run further analysis. The RedYellowGreen() function can compute two thresholds if you can supply a cost of "further analysis". Predicted values < the lower threshold are confidently classified as a negative case and predicted values > the upper threshold are confidently classified as a postive case. Predicted values in between the lower and upper thresholds are cases that should require further analysis.

Utilities and Misc. Functions:

AutoWordFreq()

AutoWordFreq() creates a word frequency data.table and a word cloud

AutoH2OTextPrepScoring()

AutoH2OTextPrepScoring() prepares your data for scoring based on models built with AutoWord2VecModel and runs internally inside the AutoH2OScoring() function. It cleans and tokenizes your text data.

ProblematicFeatures()

ProblematicFeatures() identifies columns that have either little to no variance, categorical variables with extremely high cardinality, too many NA's, too many zeros, or too high of a skew.

ProblematicRecords()

ProblematicRecords() automatically identifies anomalous data records via Isolation Forests from H2O.

RemixTheme()

RemixTheme() is a specific font, set of colors, and style for plots.

ChartTheme()

ChartTheme() is a specific font, set of colors, and style for plots.

multiplot()

multiplot() is useful for displaying multiple plots in a single pane. I've never had luck using grid so I just use this instead.

tokenizeH2O()

tokenizeH2O() tokenizes an H2O string column.

percRank()

percRank() is an inner function for calibration plots and partial dependence plots. It computes PercentRank for all numeric records in a column.

SimpleCap()

SimpleCap() apply proper case to text.

PrintObjectsSize()

PrintObjectsSize() prints out environment objects and their respective sizes. Useful for debugging programs.

tempDatesFun()

tempDatesFun() is a special case for character conversion to date when importing from Excel.

Copy Link

Version

Version

0.5.0

License

MPL-2.0

Issues

Pull Requests

Stars

Forks

Maintainer

Adrian Antico

Last Published

September 2nd, 2021

Functions in RemixAutoML (0.5.0)

AutoCatBoostdHurdleModel

AutoCatBoostdHurdleModel for generalized hurdle modeling
AutoCatBoostClassifier

AutoCatBoostClassifier is an automated catboost model grid-tuning classifier and evaluation system
AutoCatBoostRegression

AutoCatBoostRegression is an automated catboost model grid-tuning classifier and evaluation system
AutoH2OMLScoring

AutoH2OMLScoring is an automated scoring function that compliments the AutoH2o model training functions.
AutoCatBoostCARMA

AutoCatBoostCARMA Automated CatBoost Calendar, ARMA, and Trend Variables Forecasting
AutoCatBoostScoring

AutoCatBoostScoring is an automated scoring function that compliments the AutoCatBoost model training functions.
AutoDataPartition

The AutoDataPartition function
AutoCatBoostMultiClass

AutoCatBoostMultiClass is an automated catboost model grid-tuning multinomial classifier and evaluation system
AutoH2OModeler

An Automated Machine Learning Framework using H2O
AutoH2OScoring

AutoH2OScoring is the complement of AutoH20Modeler.
AutoH2OTextPrepScoring

AutoH2OTextPrepScoring is for NLP scoring
AutoH2oDRFRegression

AutoH2oDRFRegression is an automated H2O modeling framework with grid-tuning and model evaluation
AutoH2oGBMCARMA

AutoH2oGBMCARMA Automated CatBoost Calendar, ARMA, and Trend Variables Forecasting
AutoKMeans

AutoKMeans Automated row clustering for mixed column types
AutoMarketBasketModel

AutoMarketBasketModel function runs a market basket analysis automatically
AutoH2oGBMRegression

AutoH2oGBMRegression is an automated H2O modeling framework with grid-tuning and model evaluation
AutoMLTS

AutoMLTS Is an Automated Machine Learning Time Series Forecasting Function
AutoTransformationScore

AutoTransformationScore() is a the complimentary function to AutoTransformationCreate()
AutoH2oGBMClassifier

AutoH2oGBMClassifier is an automated H2O modeling framework with grid-tuning and model evaluation
AutoWord2VecModeler

Automated word2vec data generation via H2O
AutoNLS

AutoNLS is a function for automatically building nls models
AutoH2oGBMMultiClass

AutoH2oGBMMultiClass is an automated H2O modeling framework with grid-tuning and model evaluation
AutoXGBoostCARMA

AutoXGBoostCARMA Automated XGBoost Calendar, ARMA, and Trend Variables Forecasting
PrintObjectsSize

PrintObjectsSize prints out the top N objects and their associated sizes, sorted by size
AutoWordFreq

Automated Word Frequency and Word Cloud Creation
AutoH2oDRFClassifier

AutoH2oDRFClassifier is an automated H2O modeling framework with grid-tuning and model evaluation
AutoRecomDataCreate

Convert transactional data.table to a binary ratings matrix
ProblematicFeatures

ProblematicFeatures identifies problematic features for machine learning
AutoXGBoostClassifier

AutoXGBoostClassifier is an automated XGBoost modeling framework with grid-tuning and model evaluation
AutoXGBoostMultiClass

AutoXGBoostMultiClass is an automated XGBoost modeling framework with grid-tuning and model evaluation
ParDepCalPlots

ParDepCalPlots automatically builds partial dependence calibration plots for model evaluation
ModelDataPrep

Final Data Preparation Function
AutoRecommender

Automatically build the best recommender model among models available.
AutoRecommenderScoring

The AutoRecomScoring function scores recommender models from AutoRecommender()
AutoH2oDRFCARMA

AutoH2oDRFCARMA Automated CatBoost Calendar, ARMA, and Trend Variables Forecasting
AutoH2oDRFMultiClass

AutoH2oDRFMultiClass is an automated H2O modeling framework with grid-tuning and model evaluation
ResidualOutliers

ResidualOutliers is an automated time series outlier detection function
AutoXGBoostdHurdleModel

AutoXGBoostdHurdleModel is generalized hurdle modeling framework
AutoTS

AutoTS is an automated time series modeling function
Scoring_GDL_Feature_Engineering

An Automated Scoring Feature Engineering Function
ChartTheme

ChartTheme function is a ggplot theme generator for ggplots
AutoTransformationCreate

AutoTransformationCreate is a function for automatically identifying the optimal transformations for numeric features and transforming them once identified.
AutoXGBoostRegression

AutoXGBoostRegression is an automated XGBoost modeling framework with grid-tuning and model evaluation
AutoXGBoostScoring

AutoXGBoostScoring is an automated scoring function that compliments the AutoCatBoost model training functions.
SimpleCap

SimpleCap function is for capitalizing the first letter of words
Partial_DT_GDL_Feature_Engineering

A version of the DT_GDL function for creating the GDL features for a new set of records
Partial_DT_GDL_Feature_Engineering2

A version of the DT_GDL function for creating the GDL features for a new set of records
DummifyDT

DummifyDT creates dummy variables for the selected columns.
multiplot

Multiplot is a function for combining multiple plots
tokenizeH2O

For NLP work
threshOptim

Utility maximizing thresholds for binary classification
EvalPlot

EvalPlot automatically builds calibration plots for model evaluation
CreateCalendarVariables

CreateCalendarVariables Create Caledar Variables
RemixTheme

RemixTheme function is a ggplot theme generator for ggplots
RedYellowGreen

RedYellowGreen is for determining the optimal thresholds for binary classification when do-nothing is an option
ProblematicRecords

ProblematicRecords identifies problematic records for further investigation
DT_GDL_Feature_Engineering

An Automated Feature Engineering Function Using data.table frollmean
GenTSAnomVars

GenTSAnomVars is an automated z-score anomaly detection via GLM-like procedure
percRank

Percentile rank function
GDL_Feature_Engineering

An Automated Feature Engineering Function
RecomDataCreate

Convert transactional data.table to a binary ratings matrix
tempDatesFun

tempDatesFun Convert Excel datetime char columns to Date columns
AutoCatBoostHurdleModel

AutoCatBoostHurdleModel
AutoBanditNNet

AutoBanditNNet
AutoArfima

AutoArfima
AutoCatBoostFreqSizeScoring

AutoCatBoostFreqSizeScoring is for scoring the models build with AutoCatBoostSizeFreqDist()
AutoBanditSarima

AutoBanditSarima
AutoCatBoostHurdleCARMA

AutoCatBoostHurdleCARMA
AutoDataDictionaries

AutoDataDictionaries
AutoCorrAnalysis

AutoCorrAnalysis
AutoClusteringScoring

AutoClusteringScoring
AutoCatBoostSizeFreqDist

AutoCatBoostSizeFreqDist
AutoClustering

AutoClustering
AutoCatBoostVectorCARMA

AutoCatBoostVectorCARMA
AutoDiffLagN

AutoDiffLagN
AutoETS

AutoETS
AutoH2oGAMMultiClass

AutoH2oGAMMultiClass
AutoH2oGAMClassifier

AutoH2oGAMClassifier
AutoFourierFeatures

AutoFourierFeatures
AutoH2OCARMA

AutoH2OCARMA
AutoH2oDRFHurdleModel

AutoH2oDRFHurdleModel
AutoH2oGBMHurdleModel

AutoH2oGBMHurdleModel
AutoH2oGBMFreqSizeScoring

AutoH2oGBMFreqSizeScoring is for scoring the models build with AutoH2oGBMSizeFreqDist()
AutoH2oGBMSizeFreqDist

AutoH2oGBMSizeFreqDist
AutoH2oGLMClassifier

AutoH2oGLMClassifier
AutoH2oGAMRegression

AutoH2oGAMRegression
AutoLagRollStatsScoring

AutoLagRollStatsScoring
AutoLagRollStats

AutoLagRollStats
AutoLimeAid

AutoLimeAid automated lime
AutoHurdleScoring

AutoHurdleScoring
AutoInteraction

AutoInteraction
AutoHierarchicalFourier

AutoHierarchicalFourier
AutoH2oMLRegression

AutoH2oMLRegression
AutoWord2VecScoring

AutoWord2VecScoring
AutoTBATS

AutoTBATS
AutoH2oGLMRegression

AutoH2oGLMRegression
AutoH2oMLMultiClass

AutoH2oMLMultiClass
AutoH2oMLClassifier

AutoH2oMLClassifier
AutoH2oGLMMultiClass

AutoH2oGLMMultiClass
BNLearnArcStrength

BNLearnArcStrength
CatBoostArgsCheck

CatBoostArgsCheck
CARMA_Define_Args

CARMA_Define_Args
BinaryMetrics

BinaryMetrics
CARMA_Get_IndepentVariablesPass

CARMA_Get_IndepentVariablesPass CARMA_Get_IndepentVariablesPass is to help manage carma code
CarmaHoldoutMetrics

CarmaHoldoutMetrics
AutoXGBoostHurdleModel

AutoXGBoostHurdleModel
CARMA_GroupHierarchyCheck

CARMA_GroupHierarchyCheck
CatBoostGridParams

CatBoostClassifierParams
CLForecast

CLForecast
CatBoostDataPrep

CatBoostDataPrep
CatBoostGridTuner

CatBoostGridTuner
DT_BinaryConfusionMatrix

DT_BinaryConfusionMatrix
CatBoostRemoveFiles

CatBoostRemoveFiles
CLTrainer

CLTrainer
CatBoostParameterGrids

CatBoostParameterGrids
DifferenceData

DifferenceData
DifferenceDataReverse

DifferenceDataReverse
ClassificationMetrics

ClassificationMetrics
ColumnSubsetDataTable

ColumnSubsetDataTable
CatBoostDataConversion

CatBoostDataConversion
FinalBuildArfima

FinalBuildArfima
FakeDataGenerator

FakeDataGenerator
FinalBuildNNET

FinalBuildNNET
FinalBuildTBATS

FinalBuildTBATS
FinalBuildArima

FinalBuildArima
CreateHolidayVariables

CreateHolidayVariables
CatBoostFinalParams

CatBoostFinalParams
CreateProjectFolders

CreateProjectFolders Converts path files to proper path files
FinalBuildETS

FinalBuildETS
GenerateParameterGrids

GenerateParameterGrids
CatBoostImportances

CatBoostImportances
CarmaXGBoostKeepVarsGDL

CarmaXGBoostKeepVarsGDL
CatBoostPDF

CatBoostPDF
CarmaCatBoostKeepVarsGDL

CarmaCatBoostKeepVarsGDL
CatBoostValidationData

CatBoostValidationData
ContinuousTimeDataGenerator

ContinuousTimeDataGenerator
CarmaH2OKeepVarsGDL

CarmaH2OKeepVarsGDL
ID_TrainingDataGenerator

ID_TrainingDataGenerator
ID_TrainingDataGenerator2

ID_TrainingDataGenerator2
PrintToPDF

PrintToPDF
PredictArima

PredictArima
ID_BuildTrainDataSets

ID_BuildTrainDataSets
FullFactorialCatFeatures

FullFactorialCatFeatures
FinalBuildTSLM

FinalBuildTSLM
DownloadCSVFromStorageExplorer

DownloadCSVFromStorageExplorer
ID_MetadataGenerator

ID_MetadataGenerator
Logger

Logger
OptimizeETS

OptimizeETS
OptimizeArima

OptimizeArima
LimeModel

LimeModel to build a lime model
DataDisplayMeta

DataDisplayMeta
H2OAutoencoderScoring

H2OAutoencoderScoring
ExecuteSSIS

ExecuteSSIS
DeleteFile

DeleteFile
OptimizeNNET

OptimizeNNET
OptimizeTSLM

OptimizeTSLM
RPM_Binomial_Bandit

RPM_Binomial_Bandit
ParallelAutoTSLM

ParallelAutoTSLM
ParallelAutoTBATS

ParallelAutoTBATS
ParallelAutoARIMA

ParallelAutoARIMA
H2OAutoencoder

H2OAutoencoder
IntermittentDemandScoringDataGenerator

IntermittentDemandScoringDataGenerator
H2OIsolationForest

H2OIsolationForest
LB

LB
RL_Update

RL_Update RL_Update updates the bandit probabilities for selecting different grids
ParallelAutoArfima

ParallelAutoArfima
H2OIsolationForestScoring

H2OIsolationForestScoring
ML_EvalPlots

ML_EvalPlots
OptimizeTBATS

OptimizeTBATS
XGBoostParameterGrids

XGBoostParameterGrids
XGBoostGridTuner

XGBoostGridTuner
PlotGUI

PlotGUI
RL_ML_Update

RL_ML_Update
RL_Performance

RL_Performance
SQL_ClearTable

SQL_ClearTable
RemixAutoML-package

Automated Machine Learning Remixed
RL_Initialize

RL_Initialize RL_Initialize sets up the components necessary for RL
RemixClassificationMetrics

RemixClassificationMetrics
ROCPlot

ROCPlot
SQL_SaveTable

SQL_SaveTable
SQL_DropTable

SQL_DropTable
TimeSeriesMelt

TimeSeriesMelt
SQL_Server_BulkPull

SQL_Server_BulkPull
TimeSeriesPlotter

TimeSeriesPlotter
SQL_Server_BulkPush

SQL_Server_BulkPush
SQL_Server_DBConnection

SQL_Server_DBConnection
XGBoostArgsCheck

XGBoostArgsCheck
XGBoostDataPrep

XGBoostDataPrep
XGBoostFinalParams

XGBoostFinalParams
XGBoostGridParams

XGBoostGridParams
MultiClassMetrics

MultiClassMetrics
VI_Plot

VI_Plot
SQL_UpdateTable

SQL_UpdateTable
WideTimeSeriesEnsembleForecast

WideTimeSeriesEnsembleForecast
ParallelAutoETS

ParallelAutoETS
RegressionMetrics

RegressionMetrics
ParallelAutoNNET

ParallelAutoNNET
Regular_Performance

Regular_Performance
OptimizeArfima

OptimizeArfima
SQL_Query_Push

SQL_Query_Push
TimeSeriesFill

TimeSeriesFill
SQL_Query

SQL_Query
TimeSeriesDataPrepare

TimeSeriesDataPrepare
StackedTimeSeriesEnsembleForecast

TimeSeriesEnsembleForecast
XGBoostRegressionMetrics

XGBoostRegressionMetrics