Learn R Programming

GeneSelectR (version 1.0.1)

split_data: Split Data into Training and Test Sets

Description

Split Data into Training and Test Sets

Usage

split_data(X, y, test_size, modules)

Value

A list containing the split datasets:

  • @field X_train: Training set for predictors, converted to Python format.

  • @field X_test: Test set for predictors, converted to Python format.

  • @field y_train: Training set for outcomes, converted to Python format.

  • @field y_test: Test set for outcomes, converted to Python format. The function ensures that the data is appropriately partitioned and formatted for use in Python-based analysis.

Arguments

X

A dataframe or matrix of predictors.

y

A vector of outcomes.

test_size

Proportion of the data to be used as the test set.

modules

A list containing the definitions for the Python modules and submodules.

Examples

Run this code
# \donttest{
# Assuming 'data' is your dataset with predictors and 'outcome' is the target variable
# Define sklearn modules (assuming 'define_sklearn_modules' is defined)
sklearn_modules <- define_sklearn_modules()

# Split the data into training and test sets
split_results <- split_data(data, outcome, test_size = 0.2, modules = sklearn_modules)

# }

Run the code above in your browser using DataLab