stratified.cross.validation: Stratified cross validation

Description

Generate data for the stratified cross-validation.

Usage

stratified.cv.data.single.class(examples, positives, kk = 5, seed = NULL)
stratified.cv.data.over.classes(labels, examples, kk = 5, seed = NULL)

Arguments

examples

indices or names of the examples. Can be either a vector of integers or a vector of names.

positives

vector of integers or vector of names. The indices (or names) refer to the indices (or names) of 'positive' examples.

number of folds (def. kk=5).

seed

seed of the random generator (def. seed=NULL). If is set to NULL no initialization is performed.

labels

labels matrix. Rows are genes and columns are classes. Let's denote \(M\) the labels matrix. If \(M[i,j]=1\), means that the gene \(i\) is annotated with the class \(j\), otherwise \(M[i,j]=0\).

Value

stratified.cv.data.single.class returns a list with 2 two component:

fold.non.positives: a list with \(k\) components. Each component is a vector with the indices (or names) of the non-positive elements. Indexes (or names) refer to row numbers (or names) of a data matrix;
fold.positives: a list with \(k\) components. Each component is a vector with the indices (or names) of the positive elements. Indexes (or names) refer to row numbers (or names) of a data matrix;

stratified.cv.data.over.classes returns a list with \(n\) components, where \(n\) is the number of classes of the labels matrix. Each component \(n\) is in turn a list with \(k\) elements, where \(k\) is the number of folds. Each fold contains an equal amount of positives and negatives examples.

Details

Folds are stratified, i.e. contain the same amount of positive and negative examples.

Examples

Run this code

# NOT RUN {
data(labels);
examples.index <- 1:nrow(L);
examples.name <- rownames(L);
positives <- which(L[,3]==1);
x <- stratified.cv.data.single.class(examples.index, positives, kk=5, seed=23);
y <- stratified.cv.data.single.class(examples.name, positives, kk=5, seed=23);
z <- stratified.cv.data.over.classes(L, examples.index, kk=5, seed=23);
k <- stratified.cv.data.over.classes(L, examples.name, kk=5, seed=23);
# }

Run the code above in your browser using DataLab