Trains and evaluates multiple classification or regression models automatically detecting the task based on the target variable type.
fastml(
data,
label,
algorithms = "all",
test_size = 0.2,
resampling_method = "cv",
folds = ifelse(grepl("cv", resampling_method), 10, 25),
repeats = ifelse(resampling_method == "repeatedcv", 1, NA),
event_class = "first",
exclude = NULL,
recipe = NULL,
tune_params = NULL,
metric = NULL,
algorithm_engines = NULL,
n_cores = 1,
stratify = TRUE,
impute_method = "error",
impute_custom_function = NULL,
encode_categoricals = TRUE,
scaling_methods = c("center", "scale"),
summaryFunction = NULL,
use_default_tuning = FALSE,
tuning_strategy = "grid",
tuning_iterations = 10,
early_stopping = FALSE,
adaptive = FALSE,
learning_curve = FALSE,
seed = 123
)
An object of class fastml_model
containing the best model, performance metrics, and other information.
A data frame containing the features and target variable.
A string specifying the name of the target variable.
A vector of algorithm names to use. Default is "all"
to run all supported algorithms.
A numeric value between 0 and 1 indicating the proportion of the data to use for testing. Default is 0.2
.
A string specifying the resampling method for model evaluation. Default is "cv"
(cross-validation).
Other options include "none"
, "boot"
, "repeatedcv"
, etc.
An integer specifying the number of folds for cross-validation. Default is 10
for methods containing "cv" and 25
otherwise.
Number of times to repeat cross-validation (only applicable for methods like "repeatedcv").
A single string. Either "first" or "second" to specify which level of truth to consider as the "event". Default is "first".
A character vector specifying the names of the columns to be excluded from the training process.
A user-defined recipe
object for custom preprocessing. If provided, internal recipe steps (imputation, encoding, scaling) are skipped.
A list specifying hyperparameter tuning ranges. Default is NULL
.
The performance metric to optimize during training.
A named list specifying the engine to use for each algorithm.
An integer specifying the number of CPU cores to use for parallel processing. Default is 1
.
Logical indicating whether to use stratified sampling when splitting the data. Default is TRUE
for classification and FALSE
for regression.
Method for handling missing values. Options include:
"medianImpute"
Impute missing values using median imputation (recipe-based).
"knnImpute"
Impute missing values using k-nearest neighbors (recipe-based).
"bagImpute"
Impute missing values using bagging (recipe-based).
"remove"
Remove rows with missing values from the data (recipe-based).
"mice"
Impute missing values using MICE (Multiple Imputation by Chained Equations).
"missForest"
Impute missing values using the missForest algorithm.
"custom"
Use a user-provided imputation function (see `impute_custom_function`).
"error"
Do not perform imputation; if missing values are detected, stop execution with an error.
NULL
Equivalent to "error"
. No imputation is performed, and the function will stop if missing values are present.
Default is "error"
.
A function that takes a data.frame as input and returns an imputed data.frame. Used only if impute_method = "custom"
.
Logical indicating whether to encode categorical variables. Default is TRUE
.
Vector of scaling methods to apply. Default is c("center", "scale")
.
A custom summary function for model evaluation. Default is NULL
.
Logical indicating whether to use default tuning grids when tune_params
is NULL
. Default is FALSE
.
A string specifying the tuning strategy. Options might include "grid"
, "bayes"
, or "none"
. Default is "grid"
.
Number of tuning iterations (applicable for Bayesian or other iterative search methods). Default is 10
.
Logical indicating whether to use early stopping in Bayesian tuning methods (if supported). Default is FALSE
.
Logical indicating whether to use adaptive/racing methods for tuning. Default is FALSE
.
Logical. If TRUE, generate learning curves (performance vs. training size).
An integer value specifying the random seed for reproducibility.
Fast Machine Learning Function
Trains and evaluates multiple classification or regression models. The function automatically detects the task based on the target variable type and can perform advanced hyperparameter tuning using various tuning strategies.
# \donttest{
# Example 1: Using the iris dataset for binary classification (excluding 'setosa')
data(iris)
iris <- iris[iris$Species != "setosa", ] # Binary classification
iris$Species <- factor(iris$Species)
# Train models
model <- fastml(
data = iris,
label = "Species",
algorithms = c("rand_forest", "xgboost", "svm_rbf"), algorithm_engines = c(
list(rand_forest = c("ranger","aorsf", "partykit", "randomForest")))
)
# View model summary
summary(model)
# }
Run the code above in your browser using DataLab