Learn R Programming

tidylearn

Machine Learning for Tidynauts

Overview

tidylearn provides a unified tidyverse-compatible interface to R's machine learning ecosystem. It wraps proven packages like glmnet, randomForest, xgboost, e1071, cluster, and dbscan - you get the reliability of established implementations with the convenience of a consistent, tidy API.

What tidylearn does:

  • Provides one consistent interface (tl_model()) to 20+ ML algorithms
  • Returns tidy tibbles instead of varied output formats
  • Offers unified ggplot2-based visualization across all methods
  • Enables pipe-friendly workflows with %>%
  • Orchestrates complex workflows combining multiple techniques

What tidylearn is NOT:

  • A reimplementation of ML algorithms (uses established packages under the hood)
  • A replacement for the underlying packages (you can access the raw model via model$fit)

Why tidylearn?

Each ML package in R has its own API, output format, and conventions. tidylearn provides a translation layer so you can:

Without tidylearnWith tidylearn
Learn different APIs for each packageOne API for everything
Write custom code to extract resultsConsistent tibble output
Create different plots for each modelUnified visualization
Manage package-specific quirksFocus on your analysis

The underlying algorithms are unchanged - tidylearn simply makes them easier to use together.

Installation

# Install from CRAN
install.packages("tidylearn")

# Or install development version from GitHub
# devtools::install_github("ces0491/tidylearn")

Quick Start

Unified Interface

A single tl_model() function dispatches to the appropriate underlying package:

library(tidylearn)

# Classification -> uses randomForest::randomForest()
model <- tl_model(iris, Species ~ ., method = "forest")

# Regression -> uses stats::lm()
model <- tl_model(mtcars, mpg ~ wt + hp, method = "linear")

# Regularization -> uses glmnet::glmnet()
model <- tl_model(mtcars, mpg ~ ., method = "lasso")

# Clustering -> uses stats::kmeans()
model <- tl_model(iris[,1:4], method = "kmeans", k = 3)

# PCA -> uses stats::prcomp()
model <- tl_model(iris[,1:4], method = "pca")

Tidy Output

All results come back as tibbles, ready for dplyr and ggplot2:

# Predictions as tibbles
predictions <- predict(model, new_data = test_data)

# Metrics as tibbles
metrics <- tl_evaluate(model, test_data)

# Easy to pipe
model %>%
  predict(test_data) %>%
  bind_cols(test_data) %>%
  ggplot(aes(x = actual, y = prediction)) +
  geom_point()

Access the Underlying Model

You always have access to the raw model from the underlying package:

model <- tl_model(iris, Species ~ ., method = "forest")

# Access the randomForest object directly
model$fit  # This is the randomForest::randomForest() result

# Use package-specific functions if needed
randomForest::varImpPlot(model$fit)

Wrapped Packages

tidylearn provides a unified interface to these established R packages:

Supervised Learning

MethodUnderlying PackageFunction Called
"linear"statslm()
"polynomial"statslm() with poly()
"logistic"statsglm(..., family = binomial)
"ridge", "lasso", "elastic_net"glmnetglmnet()
"tree"rpartrpart()
"forest"randomForestrandomForest()
"boost"gbmgbm()
"xgboost"xgboostxgb.train()
"svm"e1071svm()
"nn"nnetnnet()
"deep"keraskeras_model_sequential()

Unsupervised Learning

MethodUnderlying PackageFunction Called
"pca"statsprcomp()
"mds"stats, MASS, smacofcmdscale(), isoMDS(), etc.
"kmeans"statskmeans()
"pam"clusterpam()
"clara"clusterclara()
"hclust"statshclust()
"dbscan"dbscandbscan()

Integration Workflows

Beyond wrapping individual packages, tidylearn provides orchestration functions that combine multiple techniques:

Dimensionality Reduction + Supervised Learning

# Reduce dimensions before classification
reduced <- tl_reduce_dimensions(iris, response = "Species",
                                method = "pca", n_components = 3)
model <- tl_model(reduced$data, Species ~ ., method = "logistic")

Cluster-Based Feature Engineering

# Add cluster membership as a feature
enriched <- tl_add_cluster_features(data, response = "target",
                                    method = "kmeans", k = 3)
model <- tl_model(enriched, target ~ ., method = "forest")

Semi-Supervised Learning

# Use clustering to propagate labels to unlabeled data
model <- tl_semisupervised(data, target ~ .,
                          labeled_indices = labeled_idx,
                          cluster_method = "kmeans")

AutoML

# Automatically try multiple approaches
result <- tl_auto_ml(data, target ~ .,
                    time_budget = 300)
result$leaderboard

Unified Visualization

Consistent ggplot2-based plotting regardless of model type:

# Generic plot method works for all model types
plot(forest_model)       # Automatic visualization based on model type
plot(linear_model)       # Diagnostic plots for regression
plot(pca_result)         # Variance explained for PCA

# Specialized plotting functions for unsupervised learning
plot_clusters(clustering_result, cluster_col = "cluster")
plot_variance_explained(pca_result$fit$variance_explained)

# Interactive dashboard for detailed exploration
tl_dashboard(model, test_data)

Philosophy

tidylearn is built on these principles:

  1. Transparency: The underlying packages do the real work. tidylearn makes them easier to use together without hiding what's happening.

  2. Consistency: One interface, tidy output, unified visualization - across all methods.

  3. Accessibility: Focus on your analysis, not on learning different package APIs.

  4. Interoperability: Results work seamlessly with dplyr, ggplot2, and the broader tidyverse.

Documentation

# View package help
?tidylearn

# Explore main functions
?tl_model
?tl_evaluate
?tl_auto_ml

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE for details.

Author

Cesaire Tobias (cesaire@sheetsolved.com)

Acknowledgments

tidylearn is a wrapper that builds upon the excellent work of many R package authors. The actual algorithms are implemented in:

  • stats (base R): lm, glm, prcomp, kmeans, hclust, cmdscale
  • glmnet: Ridge, LASSO, and elastic net regularization
  • randomForest: Random forest implementation
  • xgboost: Gradient boosting
  • gbm: Gradient boosting machines
  • e1071: Support vector machines
  • nnet: Neural networks
  • rpart: Decision trees
  • cluster: PAM, CLARA clustering
  • dbscan: Density-based clustering
  • MASS: Sammon mapping, isoMDS
  • smacof: SMACOF MDS algorithm
  • keras/tensorflow: Deep learning (optional)

Thank you to all the package maintainers whose work makes tidylearn possible.


Copy Link

Version

Install

install.packages('tidylearn')

Version

0.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Cesaire Tobias

Last Published

February 6th, 2026

Functions in tidylearn (0.1.0)

%>%

Pipe operator
plot.tidylearn_eda

Plot EDA results
optimal_clusters

Find Optimal Number of Clusters
plot_mds

Plot MDS Configuration
plot_knn_dist

Plot k-NN Distance Plot
predict.tidylearn_model

Predict using a tidylearn model
plot_cluster_comparison

Create Cluster Comparison Plot
plot.tidylearn_model

Plot method for tidylearn models
predict.tidylearn_stratified

Predict from stratified models
plot_dendrogram

Plot Dendrogram with Cluster Highlights
plot_distance_heatmap

Create Distance Heatmap
plot_silhouette

Plot Silhouette Analysis
print.tidy_silhouette

Print Method for tidy_silhouette
print.tidy_pca

Print Method for tidy_pca
plot_variance_explained

Plot Variance Explained (PCA)
plot_gap_stat

Plot Gap Statistic
filter_rules_by_item

Filter Rules by Item
plot_elbow

Create Elbow Plot for K-Means
predict.tidylearn_transfer

Predict with transfer learning model
explore_dbscan_params

Explore DBSCAN Parameters
print.tidy_apriori

Print Method for tidy_apriori
print.tidy_hclust

Print Method for tidy_hclust
print.tidylearn_automl

Print auto ML results
print.tidylearn_eda

Print EDA results
print.tidylearn_model

Print method for tidylearn models
print.tidylearn_pipeline

Print a tidylearn pipeline
print.tidy_kmeans

Print Method for tidy_kmeans
tidy_dendrogram

Plot Dendrogram
optimal_hclust_k

Determine Optimal Number of Clusters for Hierarchical Clustering
tidylearn-core

tidylearn: A Unified Tidy Interface to R's Machine Learning Ecosystem
tidylearn-deep-learning

Deep Learning for tidylearn
tidy_cutree

Cut Hierarchical Clustering Tree
recommend_products

Generate Product Recommendations
standardize_data

Standardize Data
tidy_apriori

Tidy Apriori Algorithm
tidy_hclust

Tidy Hierarchical Clustering
tidy_kmeans

Tidy K-Means Clustering
tidy_dist

Tidy Distance Matrix Computation
tidylearn-diagnostics

Advanced Diagnostics Functions for tidylearn
print.tidy_dbscan

Print Method for tidy_dbscan
plot_clusters

Plot Clusters in 2D Space
plot_cluster_sizes

Plot Cluster Size Distribution
print.tidy_mds

Print Method for tidy_mds
print.tidy_gap

Print Method for tidy_gap
tidy_clara

Tidy CLARA (Clustering Large Applications)
tidylearn-model-selection

Model Selection Functions for tidylearn
tidylearn-metrics

Metrics Functionality for tidylearn
tidy_silhouette

Tidy Silhouette Analysis
tidy_rules

Convert Association Rules to Tidy Tibble
tidy_pam

Tidy PAM (Partitioning Around Medoids)
tidy_mds_kruskal

Kruskal's Non-metric MDS
tidy_mds_classical

Classical (Metric) MDS
tidy_dbscan

Tidy DBSCAN Clustering
tidylearn-svm

Support Vector Machines for tidylearn
tidylearn-trees

Tree-based Methods for tidylearn
tidy_pca

Tidy Principal Component Analysis
tidy_knn_dist

Compute k-NN Distances
tl_extract_importance

Extract importance from a tree-based model
summary.tidylearn_model

Summary method for tidylearn models
summary.tidylearn_pipeline

Summarize a tidylearn pipeline
tl_auto_ml

High-Level Workflows for Common Machine Learning Patterns
tl_fit_svm

Fit a support vector machine model
tidy_gap_stat

Tidy Gap Statistic
tl_extract_importance_regularized

Extract importance from a regularized regression model
tl_fit_ridge

Fit a Ridge regression model
tl_compare_pipeline_models

Compare models from a pipeline
tl_anomaly_aware

Anomaly-Aware Supervised Learning
tl_compare_cv

Compare models using cross-validation
tidylearn-interactions

Interaction Analysis Functions for tidylearn
tl_auto_interactions

Find important interactions automatically
tidy_silhouette_analysis

Silhouette Analysis Across Multiple k Values
tidylearn-classification

Classification Functions for tidylearn
tidy_mds

Tidy Multidimensional Scaling
tl_check_assumptions

Check model assumptions
tl_plot_actual_predicted

Plot actual vs predicted values for a regression model
tl_calculate_pr_auc

Calculate the area under the precision-recall curve
tl_fit_boost

Fit a gradient boosting model
tl_plot_calibration

Plot calibration curve for a classification model
tl_calc_classification_metrics

Calculate classification metrics
tl_evaluate_thresholds

Evaluate metrics at different thresholds
tidy_gower

Gower Distance Calculation
tidy_pca_biplot

Create PCA Biplot
tidy_pca_screeplot

Create PCA Scree Plot
tl_default_param_grid

Create pre-defined parameter grids for common models
tl_explore

Exploratory Data Analysis Workflow
tl_get_best_model

Get the best model from a pipeline
tl_influence_measures

Calculate influence measures for a linear model
tl_model

Create a tidylearn model
tl_pipeline

Create a modeling pipeline
tidylearn-neural-networks

Neural Networks for tidylearn
tidylearn-pipeline

Model Pipeline Functions for tidylearn
tidylearn-xgboost

XGBoost Functions for tidylearn
tl_plot_nn_tuning

Plot neural network training history
tl_add_cluster_features

Cluster-Based Features
tl_plot_lift

Plot lift chart for a classification model
tl_diagnostic_dashboard

Create a comprehensive diagnostic dashboard
tl_evaluate

Evaluate a tidylearn model
tl_plot_intervals

Create confidence and prediction interval plots
tl_plot_interaction

Plot interaction effects
tl_plot_influence

Plot influence diagnostics
tl_fit_elastic_net

Fit an Elastic Net regression model
tl_plot_regularization_cv

Plot cross-validation results for a regularized regression model
tl_plot_partial_dependence

Plot partial dependence for tree-based models
tl_predict_forest

Predict using a random forest model
tl_predict_lasso

Predict using a Lasso regression model
tl_plot_precision_recall

Plot precision-recall curve for a classification model
tl_predict_polynomial

Predict using a polynomial regression model
print.tidy_pam

Print Method for tidy_pam
suggest_eps

Suggest eps Parameter for DBSCAN
tl_detect_outliers

Detect outliers in the data
tl_fit_lasso

Fit a Lasso regression model
summarize_rules

Summarize Association Rules
tl_predict_ridge

Predict using a Ridge regression model
tl_predict_regularized

Predict using a regularized regression model
tl_plot_tree

Plot a decision tree
tl_plot_tuning_results

Plot hyperparameter tuning results
tl_plot_svm_tuning

Plot SVM tuning results
tl_test_interactions

Test for significant interactions between variables
tl_fit_deep

Fit a deep learning model
tl_fit_regularized

Fit a regularized regression model (Ridge, Lasso, or Elastic Net)
tl_fit_polynomial

Fit a polynomial regression model
tidy_mds_smacof

SMACOF MDS (Metric or Non-metric)
tidy_mds_sammon

Sammon Mapping
tl_reduce_dimensions

Integration Functions: Combining Supervised and Unsupervised Learning
tl_prepare_data

Data Preprocessing for tidylearn
tidylearn-regression

Regression Functions for tidylearn
tl_predict_svm

Predict using a support vector machine model
tl_fit_forest

Fit a random forest model
tl_version

Get tidylearn version information
tl_xgboost_shap

Generate SHAP values for XGBoost model interpretation
tl_test_model_difference

Perform statistical comparison of models using cross-validation
tidylearn-regularization

Regularization Functions for tidylearn
visualize_rules

Visualize Association Rules
tl_plot_xgboost_importance

Plot feature importance for an XGBoost model
tl_fit_xgboost

Fit an XGBoost model
tl_plot_cv_results

Plot cross-validation results
tl_plot_deep_architecture

Plot deep learning model architecture
tl_plot_model_comparison

Plot model comparison
tidylearn-tuning

Hyperparameter Tuning Functions for tidylearn
tidylearn-visualization

Visualization Functions for tidylearn
tl_dashboard

Create interactive visualization dashboard for a model
tl_cv

Cross-validation for tidylearn models
tl_fit_nn

Fit a neural network model
tl_fit_logistic

Fit a logistic regression model
tl_fit_tree

Fit a decision tree model
tl_load_pipeline

Load a pipeline from disk
tl_interaction_effects

Calculate partial effects based on a model with interactions
tl_plot_nn_architecture

Plot neural network architecture
tl_plot_regularization_path

Plot regularization path for a regularized regression model
tl_plot_confusion

Plot confusion matrix for a classification model
tl_fit_linear

Fit a linear regression model
tl_plot_residuals

Plot residuals for a regression model
tl_predict_deep

Predict using a deep learning model
tl_predict_elastic_net

Predict using an Elastic Net regression model
tl_predict_tree

Predict using a decision tree model
tl_plot_gain

Plot gain chart for a classification model
tl_plot_cv_comparison

Plot comparison of cross-validation results
tl_plot_importance

Plot variable importance for tree-based models
tl_plot_roc

Plot ROC curve for a classification model
tl_tune_deep

Tune a deep learning model
tl_save_pipeline

Save a pipeline to disk
tl_transfer_learning

Transfer Learning Workflow
tl_run_pipeline

Run a tidylearn pipeline
tl_predict_xgboost

Predict using an XGBoost model
tl_plot_svm_boundary

Plot SVM decision boundary
tl_plot_xgboost_tree

Plot XGBoost tree visualization
tl_step_selection

Perform stepwise selection on a linear model
tl_predict_linear

Predict using a linear regression model
tl_predict_boost

Predict using a gradient boosting model
tl_predict_logistic

Predict using a logistic regression model
tl_stratified_models

Stratified Features via Clustering
tl_plot_diagnostics

Plot diagnostics for a regression model
tl_plot_deep_history

Plot deep learning model training history
tl_plot_importance_comparison

Plot feature importance across multiple models
tl_plot_importance_regularized

Plot variable importance for a regularized regression model
tl_plot_xgboost_shap_summary

Plot SHAP summary for XGBoost model
tl_plot_xgboost_shap_dependence

Plot SHAP dependence for a specific feature
tl_tune_nn

Tune a neural network model
tl_tune_grid

Tune hyperparameters for a model using grid search
tl_split

Split data into train and test sets
tl_predict_pipeline

Make predictions using a pipeline
tl_predict_nn

Predict using a neural network model
tl_semisupervised

Semi-Supervised Learning via Clustering
tl_tune_random

Tune hyperparameters for a model using random search
tl_tune_xgboost

Tune XGBoost hyperparameters
compare_distances

Compare Distance Methods
augment_hclust

Augment Data with Hierarchical Cluster Assignments
compare_clusterings

Compare Multiple Clustering Results
calc_validation_metrics

Calculate Cluster Validation Metrics
calc_wss

Calculate Within-Cluster Sum of Squares for Different k
augment_pam

Augment Data with PAM Cluster Assignments
augment_kmeans

Augment Data with K-Means Cluster Assignments
augment_dbscan

Augment Data with DBSCAN Cluster Assignments
create_cluster_dashboard

Create Summary Dashboard
augment_pca

Augment Original Data with PCA Scores
get_pca_loadings

Get PCA Loadings in Wide Format
find_related_items

Find Related Items
get_pca_variance

Get Variance Explained Summary
inspect_rules

Inspect Association Rules