tl_prepare_data

Unified preprocessing functions that work with both supervised and unsupervised workflows
Prepare Data for Machine Learning

Provides a unified tidyverse-compatible interface to R's machine
learning packages. Wraps established implementations from 'glmnet',
'randomForest', 'xgboost', 'e1071', 'rpart', 'gbm', 'nnet', 'cluster',
'dbscan', and others - providing consistent function signatures, tidy tibble
output, and unified 'ggplot2'-based visualization. The underlying algorithms
are unchanged; 'tidylearn' simply makes them easier to use together. Access
raw model objects via the $fit slot for package-specific functionality.
Methods include random forests Breiman (2001) <doi:10.1023/A:1010933404324>,
LASSO regression Tibshirani (1996) <doi:10.1111/j.2517-6161.1996.tb02080.x>,
elastic net Zou and Hastie (2005) <doi:10.1111/j.1467-9868.2005.00503.x>,
support vector machines Cortes and Vapnik (1995) <doi:10.1007/BF00994018>,
and gradient boosting Friedman (2001) <doi:10.1214/aos/1013203451>.

Cesaire Tobias

tidylearn

A Unified Tidy Interface to R's Machine Learning Ecosystem

tl_prepare_data function

<dl><dt>data</dt>
<dd>A data frame</dd>
<dt>formula</dt>
<dd>Optional formula (for supervised learning)</dd>
<dt>impute_method</dt>
<dd>Method for missing value imputation: "mean", "median", "mode", "knn"</dd>
<dt>scale_method</dt>
<dd>Scaling method: "standardize", "normalize", "robust", "none"</dd>
<dt>encode_categorical</dt>
<dd>Whether to encode categorical variables (default: TRUE)</dd>
<dt>remove_zero_variance</dt>
<dd>Remove zero-variance features (default: TRUE)</dd>
<dt>remove_correlated</dt>
<dd>Remove highly correlated features (default: FALSE)</dd>
<dt>correlation_cutoff</dt>
<dd>Correlation threshold for removal (default: 0.95)</dd></dl>

Arguments

Data Preprocessing for tidylearn — tl_prepare_data

<dl>

<dt>data</dt>
<dd>A data frame</dd>


<dt>formula</dt>
<dd>Optional formula (for supervised learning)</dd>


<dt>impute_method</dt>
<dd>Method for missing value imputation: "mean", "median", "mode", "knn"</dd>


<dt>scale_method</dt>
<dd>Scaling method: "standardize", "normalize", "robust", "none"</dd>


<dt>encode_categorical</dt>
<dd>Whether to encode categorical variables (default: TRUE)</dd>


<dt>remove_zero_variance</dt>
<dd>Remove zero-variance features (default: TRUE)</dd>


<dt>remove_correlated</dt>
<dd>Remove highly correlated features (default: FALSE)</dd>


<dt>correlation_cutoff</dt>
<dd>Correlation threshold for removal (default: 0.95)</dd>

</dl>

tl_prepare_data: Data Preprocessing for tidylearn

Description

Usage

Value

Arguments

Details

Examples