LassoDevelopment: Compare predictive models, created on your data

Description

This step allows you to create a Lasso model, based on your data. Lasso is a linear model, best suited for linearly separable data. It's fast to train and often a good starting point.

Usage

LassoDevelopment(type, df, grainCol, predictedCol, impute,
debug, cores, modelName)

Arguments

type

The type of model (either 'regression' or 'classification')

Dataframe whose columns are used for calc.

grainCol

Optional. The dataframe's column that has IDs pertaining to the grain. No ID columns are truly needed for this step.

predictedCol

Column that you want to predict. If you're doing classification then this should be Y/N.

impute

Set all-column imputation to T or F. If T, this uses mean replacement for numeric columns and most frequent for factorized columns. F leads to removal of rows containing NULLs. Values are saved for deployment.

debug

Provides the user extended output to the console, in order to monitor the calculations throughout. Use T or F.

cores

Number of cores you'd like to use. Defaults to 2.

modelName

Optional string. Can specify the model name. If used, you must load the same one in the deploy step.

Format

An object of class R6ClassGenerator of length 24.

Methods

The above describes params for initializing a new lassoDevelopment class with $new(). Individual methods are documented below.

<code>$new()</code>

Initializes a new lasso development class using the parameters saved in p, documented above. This method loads, cleans, and prepares data for model training. Usage: $new(p)

<code>$run()</code>

Trains model, displays feature importance and performance. Usage:$new()

<code>$getPredictions()</code>

Returns the predictions from test data. Usage: $getPredictions()

<code>$getROC()</code>

Returns the ROC curve object for plotROCs. Classification models only. Usage: $getROC()

<code>$getPRCurve()</code>

Returns the PR curve object for plotPRCurve. Classification models only. Usage: $getROC()

<code>$getAUROC()</code>

Returns the area under the ROC curve from testing for classification models. Usage: $getAUROC()

<code>$getRMSE()</code>

Returns the RMSE from test data for regression models. Usage: $getRMSE()

<code>$getMAE()</code>

Returns the RMSE from test data for regression models. Usage: $getMAE()

References

http://healthcareai-r.readthedocs.io

Examples

Run this code

# NOT RUN {
#### Example using iris dataset ####
ptm <- proc.time()
library(healthcareai)

data(iris)
head(iris)

set.seed(42)

p <- SupervisedModelDevelopmentParams$new()
p$df <- iris
p$type <- "regression"
p$impute <- TRUE
p$grainCol <- ""
p$predictedCol <- "Sepal.Width"
p$debug <- FALSE
p$cores <- 1

# Run Lasso
lasso <- LassoDevelopment$new(p)
lasso$run()

set.seed(42)
# Run Random Forest
rf <- RandomForestDevelopment$new(p)
rf$run()

cat(proc.time() - ptm,"\n")

#### Example using csv data ####
library(healthcareai)
# setwd('C:/Your/script/location') # Needed if using YOUR CSV file
ptm <- proc.time()

# Can delete this line in your work
csvfile <- system.file("extdata", "HCRDiabetesClinical.csv", package = "healthcareai")
# Replace csvfile with '/path/to/yourfile'
df <- read.csv(file = csvfile, header = TRUE, na.strings = c("NULL", "NA", ""))

head(df)

df$PatientID <- NULL

set.seed(42)
p <- SupervisedModelDevelopmentParams$new()
p$df <- df
p$type <- "classification"
p$impute <- TRUE
p$grainCol <- "PatientEncounterID"
p$predictedCol <- "ThirtyDayReadmitFLG"
p$debug <- FALSE
p$cores <- 1

# Run Lasso
lasso <- LassoDevelopment$new(p)
lasso$run()

set.seed(42)
# Run Random Forest
rf <- RandomForestDevelopment$new(p)
rf$run()

cat(proc.time() - ptm,"\n")

# }
# NOT RUN {
#### Example using SQL Server data #### This example requires: 1) That you alter
#### your connection string / query

ptm <- proc.time()
library(healthcareai)

connection.string <- "
driver={SQL Server};
server=localhost;
database=SAM;
trusted_connection=true
"
# This query should pull only rows for training. They must have a label.
query <- "
SELECT
[PatientEncounterID]
,[SystolicBPNBR]
,[LDLNBR]
,[A1CNBR]
,[GenderFLG]
,[ThirtyDayReadmitFLG]
FROM [SAM].[dbo].[HCRDiabetesClinical]
"
df <- selectData(connection.string, query)
head(df)

set.seed(42)

p <- SupervisedModelDevelopmentParams$new()
p$df <- df
p$type <- "classification"
p$impute <- TRUE
p$grainCol <- "PatientEncounterID"
p$predictedCol <- "ThirtyDayReadmitFLG"
p$debug <- FALSE
p$cores <- 1

# Run Lasso
lasso <- LassoDevelopment$new(p)
lasso$run()

set.seed(42)
# Run Random Forest
rf <- RandomForestDevelopment$new(p)
rf$run()

# Plot ROC
rocs <- list(rf$getROC(), lasso$getROC())
names <- c("Random Forest", "Lasso")
legendLoc <- "bottomright"
plotROCs(rocs, names, legendLoc)

# Plot PR Curve
rocs <- list(rf$getPRCurve(), lasso$getPRCurve())
names <- c("Random Forest", "Lasso")
legendLoc <- "bottomleft"
plotPRCurve(rocs, names, legendLoc)

cat(proc.time() - ptm,"\n")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab