This function can be used to train models on protein intensity data using different machine learning algorithms
train_models(
split_df,
resample_method = "repeatedcv",
resample_iterations = 10,
num_repeats = 3,
algorithm_list,
seed = NULL,
...
)
A list of class train
for each machine-learning algorithm.
See train
for more information on accessing
different elements of this list.
A split_df
object from performing split_data
.
The resampling method to use. Default is
"repeatedcv"
for repeated cross validation.
See trainControl
for
details on other available methods.
Number of resampling iterations. Default is
10
.
The number of complete sets of folds to compute (For
resampling method = "repeatedcv"
only).
A list of classification or regression algorithms to
use.
A full list of machine learning algorithms available through
the caret
package can be found here:
http://topepo.github.io/caret/train-models-by-tag.html. See below for
default options.
Numerical. Random number seed. Default is NULL
Additional arguments to be passed on to
train
function in the caret
package.
Chathurani Ranathunge
In the event that algorithm_list
is not provided, a default
list of four classification-based machine-learning algorithms will be used
for building and training models. Default algorithm_list
:
"svmRadial", "rf", "glm", "xgbLinear, and "naive_bayes."
Note: Models that fail to build are removed from the output.
Make sure to fix the random number seed with
seed
for reproducibility
Kuhn, Max. "Building predictive models in R using the caret package." Journal of statistical software 28 (2008): 1-26.
pre_process
# \donttest{
## Create a model_df object
covid_model_df <- pre_process(covid_fit_df, covid_norm_df)
## Split the data frame into training and test data sets
covid_split_df <- split_data(covid_model_df, seed = 8314)
## Fit models based on the default list of machine learning (ML) algorithms
covid_model_list1 <- train_models(split_df = covid_split_df, seed = 351)
## Fit models using a user-specified list of ML algorithms.
covid_model_list2 <- train_models(
covid_split_df,
algorithm_list = c("svmRadial", "glmboost"),
seed = 351
)
## Change resampling method and resampling iterations.
covid_model_list3 <- train_models(
covid_split_df,
resample_method = "cv",
resample_iterations = 50,
seed = 351
)
# }
Run the code above in your browser using DataLab