createDataPartition
From caret v4.20
by Max Kuhn
Data Splitting functions
A series of test/training partitions are created using
createDataPartition
while createResample
creates one or
more bootstrap samples. createFolds
splits the data into
k
groups.
- Keywords
- utilities
Usage
createDataPartition(y,
times = 1,
p = 0.5,
list = TRUE,
groups = min(5, length(y)))
createResample(y, times = 10, list = TRUE)
createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
Arguments
- y
- a vector of outcomes
- times
- the number of partitions to create
- p
- the percentage of data that goes to training
- list
- logical - should the results be in a list (
TRUE
) or a matrix with the number of rows equal tofloor(p * length(y))
andtimes
columns. - groups
- for numeric
y
, the number of breaks in the quantiles (see below) - k
- an integer for the number of folds.
- returnTrain
- a logical. When true, the values returned are the
sample positions corresponding to the data used during
training. This argument only works in conjunction with
list = TRUE
Details
For bootstrap samples, simple random sampling is used.
For other data splitting, the random sampling is done within the
levels of y
when y
is a factor in an attempt to balance
the class distributions within the splits. For numeric y
, the
sample is split into groups
sections based
on quantiles and sampling is done within these subgroups. Also, for
very small class sizes (
Value
- A list or matrix of row position integers corresponding to the training data
Examples
data(oil)
createDataPartition(oilType, 2)
x <- rgamma(50, 3, .5)
inA <- createDataPartition(x, list = FALSE)
plot(density(x[inA]))
rug(x[inA])
points(density(x[-inA]), type = "l", col = 4)
rug(x[-inA], col = 4)
createResample(oilType, 2)
createFolds(oilType, 10)
createFolds(oilType, 5, FALSE)
createFolds(rnorm(21))
Community examples
Looks like there are no examples yet.