Last chance! 50% off unlimited learning
Sale ends in
createDataPartition
while createResample
creates one or
more bootstrap samples. createFolds
splits the data into
k
groups.createDataPartition(y,
times = 1,
p = 0.5,
list = TRUE,
groups = min(5, length(y)))
createResample(y, times = 10, list = TRUE)
createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
createMultiFolds(y, k = 10, times = 5)
TRUE
) or a matrix
with the number of rows equal to floor(p * length(y))
and times
columns.y
, the number of breaks in the quantiles
(see below)list = TRUE
For other data splitting, the random sampling is done within the
levels of y
when y
is a factor in an attempt to balance
the class distributions within the splits.
For numeric y
, the sample is split into groups sections based
on percentiles and sampling is done within these subgroups. For
createDataPartition
, the number of percentiles is set via the
groups
argument. For createFolds
and createMultiFolds
,
the number of groups is set dynamically based on the sample size and k
.
For smaller samples sizes, these two functions may not do stratified
splitting and, at most, will split the data into quartiles.
Also, for createDataPartition
, very small class sizes (<= 3)="" the="" classes="" may="" not="" show="" up="" in="" both="" training="" and="" test="" data<="" p="">
For multiple k-fold cross-validation, completely independent folds are created.
The names of the list objects will denote the fold membership using the pattern
"Foldi.Repj" meaning the ith section (of k) of the jth cross-validation set
(of times
). Note that this function calls createFolds
with
list = TRUE
and returnTrain = TRUE
.
data(oil)
createDataPartition(oilType, 2)
x <- rgamma(50, 3, .5)
inA <- createDataPartition(x, list = FALSE)
plot(density(x[inA]))
rug(x[inA])
points(density(x[-inA]), type = "l", col = 4)
rug(x[-inA], col = 4)
createResample(oilType, 2)
createFolds(oilType, 10)
createFolds(oilType, 5, FALSE)
createFolds(rnorm(21))
Run the code above in your browser using DataLab