Learn R Programming

HMDA (version 0.1.1)

hmda.partition: Partition Data for HMDA Analysis

Description

Partition a data frame into training, testing, and optionally validation sets, and upload these sets to a local H2O server. If an outcome column y is provided and is a factor or character, stratified splitting is used; otherwise, a random split is performed. The proportions must sum to 1.

Usage

hmda.partition(
  df,
  y = NULL,
  train = 0.8,
  test = 0.2,
  validation = NULL,
  seed = 2025
)

Value

A named list containing the partitioned data frames and their corresponding H2O frames:

hmda.train

Training set (data frame).

hmda.test

Testing set (data frame).

hmda.validation

Validation set (data frame), if any.

hmda.train.hex

Training set as an H2O frame.

hmda.test.hex

Testing set as an H2O frame.

hmda.validation.hex

Validation set as an H2O frame, if applicable.

Arguments

df

A data frame to partition.

y

A string with the name of the outcome column. Must match a column in df.

train

A numeric value for the proportion of the training set.

test

A numeric value for the proportion of the testing set.

validation

Optional numeric value for the proportion of the validation set. Default is NULL. If specified, train + test + validation must equal 1.

seed

A numeric seed for reproducibility. Default is 2025.

Author

E. F. Haghish

Details

This function uses the splitTools package to perform the partition. When y is provided and is a factor or character, a stratified split is performed to preserve class proportions. Otherwise, a basic random split is used. The partitions are then converted to H2O frames using h2o::as.h2o().

Examples

Run this code
if (FALSE) {
  # Example: Random split (80% train, 20% test) using iris data
  data(iris)
  splits <- hmda.partition(
              df = iris,
              train = 0.8,
              test = 0.2,
              seed = 2025
            )
  train_data <- splits$hmda.train
  test_data  <- splits$hmda.test

  # Example: Stratified split (70% train, 15% test, 15% validation)
  # using iris data, stratified by Species
  splits_strat <- hmda.partition(
                     df = iris,
                     y = "Species",
                     train = 0.7,
                     test = 0.15,
                     validation = 0.15,
                     seed = 2025
                   )
  train_strat <- splits_strat$hmda.train
  test_strat  <- splits_strat$hmda.test
  valid_strat <- splits_strat$hmda.validation
}

Run the code above in your browser using DataLab