
Last chance! 50% off unlimited learning
Sale ends in
errorest_cv(x, y, train, classify, num_folds = 10,
hold_out = NULL, ...)
train
. (See
details.)hold_out
is not NULL
. See
Details.train
. Rather than partitioning the observations into folds, an
alternative convention is to specify the 'hold-out' size
for each test data set. Note that this convention is
equivalent to the notion of folds. We allow the user to
specify either option with the hold_out
and
num_folds
arguments. The num_folds
argument
is the default option but is ignored if the
hold_out
argument is specified (i.e. is not
NULL
).
For the given classifier, two functions must be provided
1. to train the classifier and 2. to classify unlabeled
observations. The training function is provided as
train
and the classification function as
classify
.
We expect that the first two arguments of the
train
function are x
and y
,
corresponding to the data matrix and the vector of their
labels, respectively. Additional arguments can be passed
to the train
function.
We stay with the usual R convention for the
classify
function. We expect that this function
takes two arguments: 1. an object
argument which
contains the trained classifier returned from the
function specified in train
; and 2. a
newdata
argument which contains a matrix of
observations to be classified -- the matrix should have
rows corresponding to the individual observations and
columns corresponding to the features (covariates). For
an example, see lda
.
require('MASS')
iris_x <- data.matrix(iris[, -5])
iris_y <- iris[, 5]
# Because the \\code{classify} function returns multiples objects in a list,
# we provide a wrapper function that returns only the class labels.
lda_wrapper <- function(object, newdata) { predict(object, newdata)$class }
set.seed(42)
errorest_cv(x = iris_x, y = iris_y, train = MASS:::lda, classify = lda_wrapper)
# Output: 0.02666667
Run the code above in your browser using DataLab