partition_data: Helper function that partitions a data set into training and test data sets.

Description

The function randomly partitions a data set into training and test data sets with a specified percentage of observations assigned to the training data set. The user can optionally preserve the proportions of the original data set.

Usage

partition_data(x, y, split_pct = 2/3,
    preserve_proportions = FALSE)

Arguments

a matrix of n observations (rows) and p features (columns)

a vector of n class labels

split_pct

the percentage of observations that will be randomly assigned to the training data set. The remainder of the observations will be assigned to the test data set.

preserve_proportions

logical value. If TRUE, the training and test data sets will be constructed so that the original proportions are preserved.

Value

named list containing the training and test data sets:
- train_x: matrix of the training observations
- train_y: vector of the training labels (coerced to factors).
- test_x: matrix of the test observations
- test_y: vector of the test labels (coerced to factors).

Details

A named list is returned with the training and test data sets.

Examples

Run this code

require('MASS')
x <- iris[, -5]
y <- iris[, 5]
set.seed(42)
data <- partition_data(x = x, y = y)
table(data$train_y)
table(data$test_y)

data <- partition_data(x = x, y = y, preserve_proportions = TRUE)
table(data$train_y)
table(data$test_y)

Run the code above in your browser using DataLab