Learn R Programming

leakr (version 0.1.0)

prepare_audit_data: Enhanced data preparation with robust preprocessing

Description

This function performs robust data preprocessing and prepares the data for leakage detection. It handles intelligent sampling, adjusts for the presence of a target variable, and structures the data for further audit and analysis.

Usage

prepare_audit_data(data, target, split, id, config)

Value

A list of class audit_data containing preprocessed data along with metadata, such as:

  • data: The processed data.

  • target: The target variable name.

  • split: The split vector or column name.

  • n_rows: The number of rows in the data.

  • n_cols: The number of columns in the data.

  • was_sampled: A logical indicating whether sampling was performed.

Arguments

data

A data frame containing the dataset to be audited.

target

The name of the target variable (optional). Used for stratified sampling if provided.

split

A vector or a column name specifying the data split (e.g., training/test split).

id

The unique identifier column for the dataset (optional).

config

A list of configuration settings, including sample size and other audit parameters.

Examples

Run this code
if (FALSE) {
audit_data <- prepare_audit_data(data, target = "target_column",
                                 split = "train_test_split",
                                 id = "id_column",
                                 config = list(sample_size = 50000))
}

Run the code above in your browser using DataLab