Random split sampling, stratified based on the type of the response.
rSplit(y, nsplit, stratify = TRUE, s_ratio = 0.8, ...)
Function rSplit returns a length-nsplit
vector,
the TRUE
elements indicate training subjects and
the FALSE
elements indicate test subjects.
a double vector, a logical vector, a factor, or a Surv object, response \(y\)
positive integer scalar, number of replicates of random splits to be performed
logical scalar,
whether stratification based on response \(y\) needs to be implemented, default TRUE
double scalar between 0 and 1,
split ratio, i.e., percentage of training subjects \(p\), default .8
additional parameters, currently not in use
Function rSplit performs random split sampling, with or without stratification. Specifically,
If stratify = FALSE
,
or if we have a double response \(y\),
then split the sample into a training and a test set by odds \(p/(1-p)\), without stratification.
Otherwise, split a Surv response \(y\), stratified by its censoring status. Specifically, split subjects with observed event into a training and a test set by odds \(p/(1-p)\), and split the censored subjects into a training and a test set by odds \(p/(1-p)\). Then combine the training sets from subjects with observed events and censored subjects, and combine the test sets from subjects with observed events and censored subjects.
Otherwise, split a logical response \(y\), stratified by itself.
Specifically,
split the subjects with TRUE
response into a training and a test set by odds \(p/(1-p)\),
and split the subjects with FALSE
response into a training and a test set by odds \(p/(1-p)\).
Then combine the training sets, and the test sets, in a similar fashion as described above.
Otherwise, split a factor response \(y\), stratified by its levels. Specifically, split the subjects in each level of \(y\) into a training and a test set by odds \(p/(1-p)\). Then combine the training sets, and the test sets, from all levels of \(y\).
rSplit(y = rep(c(TRUE, FALSE), times = c(20, 30)), nsplit = 3L)
Run the code above in your browser using DataLab