Free Access Week - Data Engineering + BI
Data Engineering and BI courses are free this week!
Free Access Week - Jun 2-8

promor (version 0.2.1)

train_models: Train machine learning models on training data

Description

This function can be used to train models on protein intensity data using different machine learning algorithms

Usage

train_models(
  split_df,
  resample_method = "repeatedcv",
  resample_iterations = 10,
  num_repeats = 3,
  algorithm_list,
  seed = NULL,
  ...
)

Value

A list of class train for each machine-learning algorithm. See train for more information on accessing different elements of this list.

Arguments

split_df

A split_df object from performing split_data.

resample_method

The resampling method to use. Default is "repeatedcv" for repeated cross validation. See trainControl for details on other available methods.

resample_iterations

Number of resampling iterations. Default is 10.

num_repeats

The number of complete sets of folds to compute (For resampling method = "repeatedcv" only).

algorithm_list

A list of classification or regression algorithms to use. A full list of machine learning algorithms available through the caret package can be found here: http://topepo.github.io/caret/train-models-by-tag.html. See below for default options.

seed

Numerical. Random number seed. Default is NULL

...

Additional arguments to be passed on to train function in the caret package.

Author

Chathurani Ranathunge

Details

  • In the event that algorithm_list is not provided, a default list of four classification-based machine-learning algorithms will be used for building and training models. Default algorithm_list: "svmRadial", "rf", "glm", "xgbLinear, and "naive_bayes."

  • Note: Models that fail to build are removed from the output.

  • Make sure to fix the random number seed with seed for reproducibility

References

Kuhn, Max. "Building predictive models in R using the caret package." Journal of statistical software 28 (2008): 1-26.

See Also

Examples

Run this code
# \donttest{

## Create a model_df object
covid_model_df <- pre_process(covid_fit_df, covid_norm_df)

## Split the data frame into training and test data sets
covid_split_df <- split_data(covid_model_df, seed = 8314)

## Fit models based on the default list of machine learning (ML) algorithms
covid_model_list1 <- train_models(split_df = covid_split_df, seed = 351)

## Fit models using a user-specified list of ML algorithms.
covid_model_list2 <- train_models(
  covid_split_df,
  algorithm_list = c("svmRadial", "glmboost"),
  seed = 351
)

## Change resampling method and resampling iterations.
covid_model_list3 <- train_models(
  covid_split_df,
  resample_method = "cv",
  resample_iterations = 50,
  seed = 351
)
# }

Run the code above in your browser using DataLab