ClassificationEnsembles
The goal of ClassificationEnsembles is to automatically conduct a thorough analysis of data that includes classification data. The user only needs to provide the data and answer a few questions (such as which column to analyze). ClassificationEnsembles fits 12 models (6 individual models and 6 ensembles of models). The package also returns 13 plots, five tables and a summary report sorted by accuracy (highest to lowest)
Installation
You can install the development version of ClassificationEnsembles like so:
devtools::install_github("InfiniteCuriosity/ClassificationEnsembles")Example
ClassificationEnsembles will model the location of a car seat (Good, Medium or Bad) based on the other features in the Carseats data set
library(ClassificationEnsembles)
Classification(data = ISLR::Carseats,
colnum = 7,
numresamples = 25,
predict_on_new_data = "N",
set_seed = "N",
remove_VIF_above = 5.00,
scale_all_numeric_predictors_in_data = "N",
how_to_handle_strings = 1,
save_all_trained_models = "N",
save_all_plots = "N",
use_parallel = "Y",
train_amount = 0.60,
test_amount = 0.20,
validation_amount = 0.20)
)
The 12 classification models which are built automatically are:
- C50
- Ensemble Bagged Cart
- Ensemble Bagged Random Forest
- Ensemble C50
- Ensemble Naive Bayes
- Ensemble Support Vector Machines
- Ensemble Trees
- Linear
- Partial Least Squares
- Penalized Discrmininant Analysis
- RPart
- Trees
The 26 plots it returns automatically are:
- Holdout accuracy / train accurcay by model, fixed scales
- Residuals by model, free scales
- Residuals by model, fixed scales
- Classification error, free scales
- Classification error, fixed scales
- Accuracy data, free scales
- Accuracy data, fixed scales
- Accuracy by model, free scales
- Accuracy by model, fixed scales
- Histograms of numeric columns
- Boxplots of numeric columns
- Duration barchart
- False negative rate free scales
- False negative rate fixed scales
- False positive rate, free scales
- False positive rate, fixed scales
- True negative rate, free scales
- True negative rate, fixed scales
- True positive rate, free scales
- True positive rate, fixed scales
- Over or underfitting barchart
- Model accuracy barchart
- Barchart of each feature vs target by percentage
- Barchart of each feature vs target by value
- Correlation of numeric data as circles and colors
- Correlation of numeric data as numbers and colors
The 5 tables the package returns automatically are:
- Head of the ensemble
- Head of the data frame
- Variance Inflation Factor of the numeric columns
- Correlation of the data
- Summary report, including accuracy, duration, overfitting, sum of diagonals
ensemble_bag_rf_test_pred BARBUNYA BOMBAY CALI DERMASON HOROZ SEKER SIRA BARBUNYA 21 0 0 0 0 0 0 BOMBAY 0 16 0 0 0 0 0 CALI 0 0 35 0 0 0 0 DERMASON 0 0 0 76 0 0 0 HOROZ 0 0 0 0 36 0 0 SEKER 0 0 0 0 0 48 0 SIRA 0 0 0 0 0 0 51
A data summary is also in the Console. Using dry_beans_small as an example:
$Data_summary
Eccentricity ConvexArea Extent Solidity roundness ShapeFactor4
Min. :0.2190 Min. : 20825 Min. :0.5802 Min. :0.9551 Min. :0.5718 Min. :0.9550
1st Qu.:0.7175 1st Qu.: 37052 1st Qu.:0.7240 1st Qu.:0.9859 1st Qu.:0.8320 1st Qu.:0.9941
Median :0.7642 Median : 45261 Median :0.7606 Median :0.9886 Median :0.8833 Median :0.9966
Mean :0.7517 Mean : 53997 Mean :0.7519 Mean :0.9874 Mean :0.8750 Mean :0.9952
3rd Qu.:0.8117 3rd Qu.: 62159 3rd Qu.:0.7887 3rd Qu.:0.9903 3rd Qu.:0.9191 3rd Qu.:0.9980
Max. :0.9082 Max. :229994 Max. :0.8325 Max. :0.9937 Max. :0.9879 Max. :0.9996
y BARBUNYA: 79
BOMBAY : 31
CALI : 97
DERMASON:212
HOROZ :115
SEKER :121
SIRA :158