Learn R Programming

crtests (version 0.2.1)

multitest: Create and run multiple instances of a test

Description

Wrapper for creating multiple copies of a test and running them. This function supports cross validation and regular sampling. Cross validation splits the data into 'iterations' number of folds, and uses one fold as holdout, using every other fold as training set. This is repeated 'iteration's times, using every fold as holdout exactly once. Non-cross validation takes a random sample of size holdout * nrow(data) and uses it as holdout, the rest is used for training. This is repeated 'iteration's times. Test creation and execution is handled by create_and_run_test

Usage

multitest(data, dependent, problem = c("classification", "regression"), method, name, description = "", data_transform = identity, iterations = 10, holdout = 0.2, cross_validation = FALSE, preserve_distribution = FALSE)

Arguments

data
A data frame
dependent
The dependent variable: the name of the column containing the prediction goal
problem
Either classification or regression. This influences how the algorithms are trained and what method is used to determine performance
method
The regression or classification method
name
The name of the test. Printed in the test results
description
Optional. A more elaborate description of the test
data_transform
A quoted function name that transforms the data. It should maintain it in data frame form and maintain the dependent variable.
iterations
The number of times the test is to be performed. If cross-validation is used, this is the number of folds
holdout
Sample testing only. The fraction of data to be used as holdout set
cross_validation
Logical. Should cross validation be used?
preserve_distribution
Logical, classification problems only. Should the distribution of factors in the dependent variable be as similar as possible between holdout and training sets?

Value

A list of class 'multitest_results_' + problem, containing the test results of each iteration

Examples

Run this code
## Not run: 
# library(crtests)
# library(randomForest)
# library(rpart)
# library(caret)
# library(stringr)
# 
# # A classification multitest
# multitest(data = iris,
#           dependent = "Species",
#           problem = "classification",
#           method = "randomForest",
#           name = "An example classification multitest",
#           iterations = 10,
#           cross_validation = TRUE,
#           preserve_distribution = TRUE
# )
# 
# # A regression multitest
# multitest(data = iris,
#           dependent = "Sepal.Width",
#           problem = "regression",
#           method = "rpart",
#           name = "An example regression multitest",
#           iterations = 15,
#           cross_validation = FALSE,
# )
# 
# ## End(Not run)

Run the code above in your browser using DataLab