modelingSummary: Get modeling metrics

Description

modelingSummary is an automatic function for modeling data, it returns a dataframe containing the metrics of the modeling using five machine learning algorithms: KNN, SVM, RF, NNET, and Bcart. This function is based on spliData, tuneTrain, predict, and getMetrics functions.

Usage

modelingSummary(
  data,
  y,
  p = 0.7,
  length = 10,
  control = "repeatedcv",
  number = 10,
  repeats = 10,
  process = c("center", "scale"),
  summary = multiClassSummary,
  positive,
  parallelComputing = FALSE,
  classtype,
  ...
)

Arguments

data

object of class "data.frame" with target variable and predictor variables.

character. Target variable.

numeric. Proportion of data to be used for training. Default: 0.7

length

integer. Number of values to output for each tuning parameter. If search = "random" is passed to trainControl through ..., this becomes the maximum number of tuning parameter combinations that are generated by the random search. Default: 10.

control

character. Resampling method to use. Choices include: "boot", "boot632", "optimism_boot", "boot_all", "cv", "repeatedcv", "LOOCV", "LGOCV", "none", "oob", timeslice, "adaptive_cv", "adaptive_boot", or "adaptive_LGOCV". Default: "repeatedcv". See train for specific details on the resampling methods.

number

integer. Number of cross-validation folds or number of resampling iterations. Default: 10.

repeats

integer. Number of folds for repeated k-fold cross-validation if "repeatedcv" is chosen as the resampling method in control. Default: 10.

process

character. Defines the pre-processing transformation of predictor variables to be done. Options are: "BoxCox", "YeoJohnson", "expoTrans", "center", "scale", "range", "knnImpute", "bagImpute", "medianImpute", "pca", "ica", or "spatialSign". See preProcess for specific details on each pre-processing transformation. Default: c('center', 'scale').

summary

expression. Computes performance metrics across resamples. For numeric y, the mean squared error and R-squared are calculated. For factor y, the overall accuracy and Kappa are calculated. See trainControl and defaultSummary for details on specification and summary options. Default: multiClassSummary.

positive

character. The positive class for the target variable if y is factor. Usually, it is the first level of the factor.

parallelComputing

logical. indicates whether to also use the parallel processing. Default: False

classtype

integer.indicates the number of classes of the traits.

...

additional arguments to be passed to createDataPartition, trainControl and train functions in the package caret.

Value

A dataframe contains the metrics of the modeling of five machine learning algorithms: KNN, SVM, RF, NNET, and Bcart.

tuneTrain relies on package caret to perform the modeling.

Details

Types of classification and regression models available for use with tuneTrain can be found using names(getModelInfo()). The results given depend on the type of model used.

Examples

Run this code

# NOT RUN {
if(interactive()){
 data(septoriaDurumWC)
 models <- modelingSummary(data = septoriaDurumWC, y = "ST_S", positive = "R", classtype = 2)
}
# }

Run the code above in your browser using DataLab