Learn R Programming

Qindex (version 0.1.6)

rSplit: Stratified Random Split Sampling

Description

Random split sampling, stratified based on the type of the response.

Usage

rSplit(y, nsplit, stratify = TRUE, s_ratio = 0.8, ...)

Value

Function rSplit returns a length-nsplit

list of logical

vectors. In each logical

vector, the TRUE elements indicate training subjects and the FALSE elements indicate test subjects.

Arguments

y

a double vector, a logical vector, a factor, or a Surv object, response \(y\)

nsplit

positive integer scalar, number of replicates of random splits to be performed

stratify

logical scalar, whether stratification based on response \(y\) needs to be implemented, default TRUE

s_ratio

double scalar between 0 and 1, split ratio, i.e., percentage of training subjects \(p\), default .8

...

additional parameters, currently not in use

Details

Function rSplit performs random split sampling, with or without stratification. Specifically,

  • If stratify = FALSE, or if we have a double response \(y\), then split the sample into a training and a test set by odds \(p/(1-p)\), without stratification.

  • Otherwise, split a Surv response \(y\), stratified by its censoring status. Specifically, split subjects with observed event into a training and a test set by odds \(p/(1-p)\), and split the censored subjects into a training and a test set by odds \(p/(1-p)\). Then combine the training sets from subjects with observed events and censored subjects, and combine the test sets from subjects with observed events and censored subjects.

  • Otherwise, split a logical response \(y\), stratified by itself. Specifically, split the subjects with TRUE response into a training and a test set by odds \(p/(1-p)\), and split the subjects with FALSE response into a training and a test set by odds \(p/(1-p)\). Then combine the training sets, and the test sets, in a similar fashion as described above.

  • Otherwise, split a factor response \(y\), stratified by its levels. Specifically, split the subjects in each level of \(y\) into a training and a test set by odds \(p/(1-p)\). Then combine the training sets, and the test sets, from all levels of \(y\).

See Also

Examples

Run this code
rSplit(y = rep(c(TRUE, FALSE), times = c(20, 30)), nsplit = 3L)

Run the code above in your browser using DataLab