evaluate: Evaluate a modeling procedure

Description

This function performs the important task of evaluating the performance of a modeling procedure with resampling, including tuning and pre-processing to not bias the results by information leakage.

Usage

evaluate(procedure, x, y, resample, pre_process = pre_split, .save = c(model
  = TRUE, prediction = TRUE, error = TRUE, importance = FALSE), .cores = 1,
  .checkpoint_dir = NULL, .return_error = .cores > 1,
  .verbose = getOption("emil_verbose", TRUE))

Arguments

procedure

Modeling procedure, or list of modeling procedures, as produced by modeling_procedure.

Dataset, observations as rows and descriptors as columns.

Response vector.

resample

The test subsets used for parameter tuning. Leave blank to randomly generate a resampling scheme of the same kind as is used by evaluate to assess the performance of the whole modeling_procedure.

pre_process

Function that performs pre-processing and splits dataset into fitting and test subsets.

.save

What parts of the modeling results to return to the user. If importance is FALSE varible importance calculation will be skipped.

.cores

Number of CPU-cores to use for parallel computation. The current implementation is based on mcMap, which unfortunatelly do not work on Windows systems. It can however be re-implemented by the user fairly easi

.checkpoint_dir

Directory to save intermediate results to, after every completed fold. The directory will be created if it doesn't exist, but not recursively.

.return_error

If FALSE the entire modeling is aborted upon an error. If TRUE the modeling of the particular fold is aborted and the error message is returned instead of its results.

.verbose

Whether to print an activity log.

Value

A list tree where the top level corresponds to folds (in case of multiple folds), the next level corresponds to the modeling procedures (in case of multiple procedures), and the final level is specified by the .save parameter. It typically contains a subset of the following elements: [object Object],[object Object],[object Object],[object Object],[object Object]

Examples

Run this code

x <- iris[-5]
y <- iris$Species
cv <- resample("crossvalidation", y, nfold = 4, nrepeat = 4)
result <- evaluate("lda", x, y, resample=cv)

# Multiple procedures fitted and tested simultaneously. 
# This is useful when the dataset is large and the splitting takes a long time.
# If you name the elements of the list emil will also name the elements of the
# results object in the same way.
result <- evaluate(c(Linear = "lda", Quadratic = "qda"), x, y, resample=cv)

# Multicore parallelization (on a single computer)
result <- evaluate("lda", x, y, resample=cv, .cores=2)

# Parallelization using a cluster (not limited to a single computer)
# PSOCK is supported on windows too!
require(parallel)
cl <- makePSOCKcluster(2)
clusterEvalQ(cl, library(emil))
clusterExport(cl, c("x", "y"))
result <- parLapply(cl, cv, function(fold)
    evaluate("lda", x, y, resample=fold))

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples