Learn R Programming

⚠️There's a newer version (0.1.5) of this package.Take me there.


Background

DriveML is a series of functions sucha as AutoDataPrep, AutoMAR, autoMLmodel. DriveML automates some of the most difficult machine learning functions such as data exploratory analysis, data cleaning, data tranformations, feature engineering, model training, model validation, model tuning and model selection.

Functionalities of DriveML

DriveML R package has four unique functionalities

  1. Data Exploration

    • SmartEDA has a complete exploratory data analysis function
  2. Data preparations

    • AutoDataPrep function to generate a novel features based on the functional understanding of the dataset
  3. Machine learning models

    • autoMLmodel function to develope baseline machine learning models using regression and tree based classfication techniques
  4. Model report

    • autoMLReport function to print the machine learning model outcome in HTML format

Installation

The package can be installed directly from CRAN.

install.packages("DriveML")

Example

Data

In this vignette, we will be using Heart Disease - Classifications data set

Data Source kaggle.

	install.packages("DriveML")
	install.packages("SmartEDA")
	library("DriveML")
	library("SmartEDA")
	## Load heart desease dataset 
	heart <- DriveML::heart
	

Overview of the data

Understanding the dimensions of the dataset, variable names, overall missing summary and data types of each variables

## overview of the data; 
	ExpData(data = heart, type = 1)
## structure of the data	
	ExpData(data = heart, type = 2)

Summary of numerical variables

To summarise the numeric variables, you can use following r codes from this pacakge

## Summary statistics by – overall
	ExpNumStat(heart, by = "GA", gp = "target_var", Qnt = seq(0, 1, 0.1), MesofShape = 2, Outlier = TRUE, round = 2)

Graphical representation of all numeric features

## Generate Boxplot by category
ExpNumViz(heart, gp = "target_var", type = 2, nlim = 25, Page = c(2, 2))

## Generate Density plot
ExpNumViz(heart, gp = NULL, type = 3, nlim = 10, Page = c(2, 2))

## Generate Scatter plot
ExpNumViz(heart, target="target_var", nlim = 4, scatter = TRUE, Page=c(2, 1))

Machine learning model using 3 simple steps

Data preparation using autoDataprep function

Single function to prepare end to end data preparation for machine learning models

# Data Preparation
small_data <- autoDataprep(heart, target = "target_var", missimpute = "default",
                       auto_mar = TRUE, mar_object = NULL, dummyvar = TRUE,
                       char_var_limit = 12, aucv = 0.02, corr = 0.99,
                       outlier_flag = TRUE, interaction_var = TRUE,
                       frequent_var = TRUE, uid = NULL, onlykeep = NULL, drop = NULL)

# Print output on R console
printautoDataprep(small_data)

# Final prepared master data

small_data_t <- small_data$master_data

Machine learning classification model using autoMLmodel function

Single function to develope six different types of machine learning binary classification models with the help of hyperparameter tuining using random search

# DriveML Model development
small_ml_random <- autoMLmodel(small_data_t,  target = "target_var",  testSplit = 0.2,
                      tuneIters = 100,  tuneType = "random",
                      models = "all", varImp = 20,  liftGroup = 10,  maxObs = 10000,  uid = NULL,
                      pdp = T, positive = 1, htmlreport = FALSE, seed = 1991)

# Model summary results
small_ml_random$results

Present model report using autoMLReport function

Generate a report in html format for the output of autoDataprep and autoMLmodel DriveML fucntions. Alos autoMLReport is an inbuilt function under autoMLmodel.

autoMLReport(mlobject = small_ml_random, mldata = small_data, op_file = "driveML_ouput_heart_data.html")

Articles

Copy Link

Version

Install

install.packages('DriveML')

Monthly Downloads

60

Version

0.1.0

License

MIT + file LICENSE

Maintainer

Dayanand Ubrangala

Last Published

June 4th, 2020

Functions in DriveML (0.1.0)

predictDataprep

Extract predictions and generate columns from autoDataprep objects
printautoDataprep

Print Method for the autoDataprep Class
generateFeature

Automated column transformer
smartEDA

SmartEDA - functions that automates most of exploratory analyses tasks in modeling
printautoMAR

Print Method for the autoMAR Class Print the result of autoMAR object
autoMLmodel

Automated machine learning training of models
heart.model

Heart Classification Drive ML Model.
autoPDP

Generate partial dependence plots
autoDataprep

Automatic data preparation for ML algorithm
autoMAR

Function to identify and generate the Missing at Random features (MAR)
heart

Dataset Heart Disease - Classifications
autoMLReport

Display autoMLmodel output in HTML format using Rmarkdown
misspattern

Missing pattern analysis for missing data
predictAutoMAR

Extract predictions and MAR columns from autoMAR objects