Learn R Programming

sortinghat (version 0.1)

partition_data: Helper function that partitions a data set into training and test data sets.

Description

The function randomly partitions a data set into training and test data sets with a specified percentage of observations assigned to the training data set. The user can optionally preserve the proportions of the original data set.

Usage

partition_data(x, y, split_pct = 2/3,
    preserve_proportions = FALSE)

Arguments

x
a matrix of n observations (rows) and p features (columns)
y
a vector of n class labels
split_pct
the percentage of observations that will be randomly assigned to the training data set. The remainder of the observations will be assigned to the test data set.
preserve_proportions
logical value. If TRUE, the training and test data sets will be constructed so that the original proportions are preserved.

Value

  • named list containing the training and test data sets:
    • train_x: matrix of the training observations
    • train_y: vector of the training labels (coerced to factors).
    • test_x: matrix of the test observations
    • test_y: vector of the test labels (coerced to factors).

Details

A named list is returned with the training and test data sets.

Examples

Run this code
require('MASS')
x <- iris[, -5]
y <- iris[, 5]
set.seed(42)
data <- partition_data(x = x, y = y)
table(data$train_y)
table(data$test_y)

data <- partition_data(x = x, y = y, preserve_proportions = TRUE)
table(data$train_y)
table(data$test_y)

Run the code above in your browser using DataLab