Numeric—function to automatically build 18 individual models and 14 ensembles then return the results to the user
Numeric(
data,
colnum,
numresamples,
remove_VIF_above = 5,
remove_data_correlations_greater_than = 0.99,
remove_ensemble_correlations_greater_than = 0.98,
scale_all_predictors_in_data = c("Y", "N"),
data_reduction_method = c(0("none"), 1("BIC exhaustive"), 2("BIC forward"),
3("BIC backward"), 4("BIC seqrep"), 5("Mallows_cp exhaustive"),
6("Mallows_cp forward"), 7("Mallows_cp backward"), 8("Mallows_cp, seqrep")),
ensemble_reduction_method = c(0("none"), 1("BIC exhaustive"), 2("BIC forward"),
3("BIC backward"), 4("BIC seqrep"), 5("Mallows_cp exhaustive"),
6("Mallows_cp forward"), 7("Mallows_cp backward"), 8("Mallows_cp, seqrep")),
how_to_handle_strings = c(0("none"), 1("factor levels"), 2("One-hot encoding"),
3("One-hot encoding with jitter")),
predict_on_new_data = c("Y", "N"),
set_seed = c("Y", "N"),
save_all_trained_models = c("Y", "N"),
save_all_plots = c("Y", "N"),
use_parallel = c("Y", "N"),
stratified_random_column,
train_amount,
test_amount,
validation_amount
)a real number
data can be a CSV file or within an R package, such as MASS::Boston
a column number in your data
the number of resamples
remove columns with Variable Inflation Factor above value chosen by the user
maximum value for correlations of the original data (such as the Boston Housing data set)
maximum value for correlations of the ensemble
"Y" or "N" to scale numeric data
0(none), BIC (1, 2, 3, 4) or Mallow's_cp (5, 6, 7, 8) for Forward, Backward, Exhaustive and SeqRep
0(none), BIC (1, 2, 3, 4) or Mallow's_cp (5, 6, 7, 8) for Forward, Backward, Exhaustive and SeqRep
0: No strings, 1: Factor values, 2: One-hot encoding, 3: One-hot encoding AND jitter
"Y" or "N". If "Y", then you will be asked for the new data
"Y" or "N" to set the seed to make the results fully reproducible
"Y" or "N". If "Y", then places all the trained models in the temporary directory, tempdir(), and the trained models may be retreived from there
Saves all plots to the tempdir() directory, and the plots may be retreived from there
"Y" or "N" for parallel processing
0 if no stratified random sampling, or column number for stratified random sampling
set the amount for the training data
set the amount for the testing data
Set the amount for the validation data