splitStratify: Split by Stratified Sampling

Description

splitStratify builds a training and validation set through a stratified random sampling process. This function utilizes the strata function from the sampling package as well as the cut function from the base package. The latter function provides a means by which to bin continuous data prior to stratified random sampling. We refer the user to the parameter descriptions to learn the specifics of how to apply binning, although the user might find it easier to instead bin annotations beforehand. When applied to an ExprsMulti object, this function stratifies subjects across all classes found in that dataset.

Usage

splitStratify(object, percent.include = 67, colBy = NULL,
  bin = rep(FALSE, length(colBy)), breaks = rep(list(NA),
  length(colBy)), ...)

Arguments

object

An ExprsArray object to split.

percent.include

Specifies the percent of the total number of subjects to include in the training set.

colBy

Specifies a vector of column names by which to stratify in addition to class labels annotation. If colBy = NULL, random sampling will occur across the class label annotation only. For splitStratify only.

bin

A logical vector indicating whether to bin the respective colBy column using cut (e.g., bin = c(FALSE, TRUE)). For splitStratify only.

breaks

A list. Each element of the list should correspond to a breaks argument passed to cut for the respective colBy column. Set an element to NA when not binning that colBy. For splitStratify only.

...

For splitSample: additional arguments passed along to sample. For splitStratify: additional arguments passed along to cut.

Value

Returns a list of two ExprsArray objects.