fastml: Fast Machine Learning Function

Description

Trains and evaluates multiple classification or regression models automatically detecting the task based on the target variable type.

Usage

fastml(
  data,
  label,
  algorithms = "all",
  test_size = 0.2,
  resampling_method = "cv",
  folds = ifelse(grepl("cv", resampling_method), 10, 25),
  repeats = ifelse(resampling_method == "repeatedcv", 1, NA),
  tune_params = NULL,
  metric = NULL,
  n_cores = 1,
  stratify = NULL,
  impute_method = "error",
  encode_categoricals = TRUE,
  scaling_methods = c("center", "scale"),
  summaryFunction = NULL,
  use_default_tuning = FALSE,
  seed = 123
)

Value

An object of class fastml_model containing the best model, performance metrics, and other information.

Arguments

data

A data frame containing the features and target variable.

label

A string specifying the name of the target variable.

algorithms

A vector of algorithm names to use. Default is "all" to run all supported algorithms.

test_size

A numeric value between 0 and 1 indicating the proportion of the data to use for testing. Default is 0.2.

resampling_method

A string specifying the resampling method for cross-validation. Default is "cv" (cross-validation). Other options include "none", "boot", "repeatedcv", etc.

folds

An integer specifying the number of folds for cross-validation. Default is 10 for methods containing "cv" and 25 otherwise.

repeats

Number of times to repeat cross-validation (only applicable for methods like "repeatedcv").

tune_params

A list specifying hyperparameter tuning ranges. Default is NULL.

metric

The performance metric to optimize during training. Default depends on the task.

n_cores

An integer specifying the number of CPU cores to use for parallel processing. Default is 1.

stratify

Logical indicating whether to use stratified sampling when splitting the data. Default is TRUE for classification and FALSE for regression.

impute_method

Method for handling missing values. Options include:

"medianImpute": Impute missing values using median imputation.

"knnImpute"

Impute missing values using k-nearest neighbors.

"bagImpute"

Impute missing values using bagging.

"remove"

Remove rows with missing values from the data.

"error"

Do not perform imputation; if missing values are detected after preprocessing, stop execution with an error.

NULL

Equivalent to "error". No imputation is performed, and the function will stop if missing values are present.

Default is "error".

encode_categoricals

Logical indicating whether to encode categorical variables. Default is TRUE.

scaling_methods

Vector of scaling methods to apply. Default is c("center", "scale").

summaryFunction

A custom summary function for model evaluation. Default is NULL.

use_default_tuning

Logical indicating whether to use default tuning grids when tune_params is NULL. Default is FALSE.

seed

An integer value specifying the random seed for reproducibility.

Examples

Run this code

  # Example 1: Using the iris dataset for binary classification (excluding 'setosa')
  data(iris)
  iris <- iris[iris$Species != "setosa", ]  # Binary classification
  iris$Species <- factor(iris$Species)

  # Train models
  model <- fastml(
    data = iris,
    label = "Species",
    algorithms = c("random_forest", "xgboost", "svm_radial")
  )

  # View model summary
  summary(model)

  # Example 2: Using the mtcars dataset for regression
  data(mtcars)

  # Train models
  model <- fastml(
    data = mtcars,
    label = "mpg",
    algorithms = c("random_forest", "xgboost", "svm_radial")
  )

  # View model summary
  summary(model)

Run the code above in your browser using DataLab