Orangutan

Orangutan is an R package for analyzing and visualizing measurements (morphometrics) from groups such as species or populations. It runs a full analysis pipeline that summarizes data, finds variables that differentiate groups, performs multivariate and univariate statistics, and produces publication-ready plots.

What Orangutan does
Installation
Implementation
Description of run_orangutan() arguments
Input data format
HTML Report
Contributing / Support
Citation

What Orangutan does

Loads and validates your CSV data (requires a species column).
Optionally applies allometric correction
Adjusts mensural measurements for a user-selected variable (e.g. body size).
- Included in downstream cleaned datasets and summaries (no standalone file)
Optionally removes extreme outliers within species
Uses user-specified variables and a configurable tail percentage.
- 05_data_cleaned_outliers_removed.csv
- 05_qc_outlier_audit_log.csv
Computes per-species summary statistics
Mean, SD, min, and max for all variables.
- 06_summary_stats.csv
Identifies variables that do not overlap between species
Finds diagnostic traits and produces publication-ready plots.
- 07_nonoverlaps_list.csv
- 07_nonoverlap_plot_<species1>_vs_<species2>_<variable>.pdf
Runs multivariate tests on the full dataset
- Tests homogeneity of multivariate dispersion (beta-dispersion).
- Runs PERMANOVA and flags results if dispersion assumptions are violated.
  - 08_multi_betadisper_overall_test.csv
  - 08_multi_betadisper_pairwise_tests.csv
  - 08_multi_permanova_species_effect.csv
Performs Principal Components Analysis (PCA) on scaled variables
- Produces a PCA scatterplot with optional group encirclement.
- Reports variable loadings contributing to PC1 and PC2 and visualizes them natively.
  - 09_multi_pca_plot.pdf
  - 09_multi_pca_top_loadings_PC1_PC2_plot.pdf
  - 09_multi_pca_top_loadings_PC1_PC2.csv
Runs PCA axis post-hoc tests
- Tests PCA axes cumulatively explaining ~90% of variance.
- Uses ANOVA + Tukey HSD when assumptions are met.
- Falls back to Kruskal–Wallis + Dunn tests otherwise.
- Reports significant species differences per PC axis.
  - 09_multi_pca_posthoc.csv
Runs Discriminant Analysis of Principal Components (DAPC)
- Produces discriminant plots.
- Evaluates classification performance.
- Reports misclassified individuals.
  - 10_multi_dapc_plot.pdf
  - 11_multi_dapc_confusion_matrix.csv
  - 11_multi_dapc_performance_metrics.csv
  - 11_multi_dapc_misclassified_individuals.csv
Performs univariate tests for each variable
- ANOVA + Tukey when parametric assumptions are met.
- Kruskal–Wallis + Dunn when parametric assumptions fail.
- Generates corresponding plots with significance lettering.
  - 12_uni_anova_summary.csv
  - 12_uni_anova_plot_<variable>.pdf
  - 13_uni_kruskalwallis_summary.csv
  - 13_uni_kruskalwallis_plot_<variable>.pdf
Automatically identifies and analyzes categorical variables
- Runs Pearson's Chi-squared tests between categorical traits and species.
- Uses simulated p-values for robustness with sparse data.
- Performs FDR-corrected pairwise post-hoc tests to detect specific species-level differences.
- Reports statistical reliability notes for small sample sizes (N < 50) or sparse cells.
- Produces proportional stacked bar plots using distinctly muted pastel palettes structurally separated from the main species aesthetics.
  - 14_categorical_analysis_summary.csv
  - 14_categorical_percentages_summary.csv
  - 14_categorical_barplot_<variable>.pdf
Ensures reproducibility
- Saves all results, plots, configuration details, and methods summaries to output_dir.
  - 00_methods_summary.txt — human-readable methods summary alongside the exact R environment and call configurations.
Generates an HTML interpretation report
- Automatically produced at the end of every run.
- Summarizes results in plain language with embedded plot thumbnails.
- Covers all analysis sections: diagnostic traits, PERMANOVA, PCA, DAPC, and univariate tests.
  - orangutan_report.html

Installation

Stable version (CRAN)

Install the latest stable release from CRAN (v2.0.0):

install.packages("Orangutan")

Development version (GitHub)

Install the development version directly from GitHub (v2.1.0):

install.packages("pak")
pak::pak("metalofis/Orangutan-R")

Implementation

Quick example: run_orangutan called with default parameters (writes results next to the input file by default):

library(Orangutan)

run_orangutan("data/my_dataset.csv")

Full example: run_orangutan called with all available arguments

library(Orangutan)  # Load the Orangutan package

run_orangutan(
  # ---------- Input / output ----------
  data_path = "data/my_dataset.csv",             # Path to your input CSV dataset
  output_dir = "address/to/orangutan_outputs",   # Folder where all outputs (plots, tables) will be saved
  
  # ---------- Allometry ----------
  apply_allometry = TRUE,             # Whether to adjust measurements for allometry
  allometry_var = "SVL",              # Column used as the reference variable for allometry correction
  
  # ---------- Outlier handling ----------
  remove_outliers = TRUE,             # Whether to remove extreme values (outliers)
  outlier_vars = c("SVL"),            # Which variables to check for outliers
  outlier_tail_pct = 0.05,            # Proportion of extreme values to remove from each tail (5% here)
  
  # ---------- PCA / DAPC highlighting ----------
  species_to_encircle = c("carolinensis", "torresfundorai"), # Species to highlight on PCA/DAPC plots
  
  # ---------- Color palette ----------
  palette_name = "Paired",            # Name of the color palette for plots ("Paired", "Set3", "Dark2")
  custom_colors = c(SpeciesA = "#FF0000", SpeciesB = "#00FF00"), # Optional: custom hex codes for specific species
  
  # ---------- Point aesthetics ----------
  point_aes = list(
    point_size    = 3.5,              # Size of each individual point
    jitter_width  = 0.1,              # Horizontal jitter to prevent overplotting
    jitter_alpha  = 0.8,              # Transparency of points
    jitter_shape  = 21,               # Shape of the points (21 = filled circle with border)
    jitter_color  = "black",          # Border color of points
    jitter_stroke = 0.35              # Thickness of the point border
  ),
  
  # ---------- Mean point aesthetics ----------
  mean_aes = list(
    size   = 1.8,                      # Size of the mean point
    shape  = 21,                       # Shape of the mean point
    fill   = "white",                  # Fill color of the mean point
    color  = "black",                  # Border color of the mean point
    stroke = 0.6                       # Thickness of the mean point border
  ),
  
  # ---------- Violin aesthetics ----------
  violin_aes = list(
    alpha = 0.4                         # Transparency of violin plots
  ),
  
  # ---------- Boxplot aesthetics ----------
  box_aes = list(
    alpha = 0.4,                        # Transparency of boxplots
    width = 0.15                        # Width of boxplots
  ),
  
  # ---------- Label / text control ----------
  label_aes = list(
    text_size      = 6,                 # Size of text labels on plots
    axis_text_size = 10,                # Size of axis tick labels
    title_size     = 12,                # Size of plot titles
    label_offset   = 0.05               # Distance of labels from points
  ),
  
  # ---------- Optional label templates ----------
  label_templates = list(
    nonoverlap_title = "Non-Overlapping Pair: %s vs %s for %s", # Title template for non-overlapping variable plots
    pca_x = "PC1 (%s%% variance)",       # Label for PCA X-axis with explained variance
    pca_y = "PC2 (%s%% variance)",       # Label for PCA Y-axis with explained variance
    dapc_x = "LD1 (%s%%)",               # Label for DAPC X-axis with explained variance
    dapc_y = "LD2 (%s%%)",               # Label for DAPC Y-axis with explained variance
    dapc_title_1d = "DAPC – Single Discriminant Axis" # Title for one-dimensional DAPC plots
  ),
  
  # ---------- Multivariate test seeds ----------
  seeds = list(betadisper = 123, permanova = 456),   # Seed for reproducible dispersion/randomization calculations and permutation tests
  
  # ---------- Messaging ----------
  verbose = FALSE                                    # Whether to print progress messages in console
)

Description of run_orangutan() arguments

data_path: Path to your CSV file (required).
output_dir: Where results are saved (default: folder next to the input file).
apply_allometry: TRUE/FALSE — adjust measurements by a size variable.
allometry_var: Variable used as the size reference for allometric correction (required if apply_allometry = TRUE).
remove_outliers: TRUE/FALSE — whether to remove outliers by species.
outlier_vars: Variable(s) used to detect outliers (required if remove_outliers = TRUE).
outlier_tail_pct: How extreme to consider for outliers (default 0.05 = 5% tail).
species_to_encircle: Species names to highlight (draw polygons) in PCA/DAPC plots.
palette_name: RColorBrewer palette to use for colors (default "Paired").
custom_colors: Optional named vector of hex codes for species (e.g., c(SpeciesA = "#FF0000")).
seeds: Named list of seeds for reproducible random steps (default: list(betadisper = 123, permanova = 456)).
label_templates: Optional list to tweak plot labels and titles (sprintf-style templates).
point_aes, mean_aes, violin_aes, box_aes, label_aes: Lists to customize plot appearance (see Plot customization below).

Input data format

A CSV with a species column and one or more numeric measurement columns.

species	main_length	Head_length	Supralabials	Color
allisoni	86.5	25.2	9	Blue
allisoni	73.6	24.8	8	Blue
carolinensis	63.0	18.3	8	Green
carolinensis	59.0	19.17	8	Green
torresfundorai	66.9	18.7	7	Green
torresfundorai	70.9	23.6	7	Green

HTML Report

Every run automatically produces orangutan_report.html inside output_dir. Open it in any web browser to get a plain-language summary of all analysis sections, with embedded thumbnail images of the key plots. No extra arguments are needed — the report is generated by default.

Contributing / Support

Open issues or pull requests on the project GitHub for bugs, feature requests, or improvements.
Add a star if this package was useful.

Citation

Torres, J. (2026). Orangutan: An R Package for Analyzing and Visualizing Phenotypic Data in the Context of Species Descriptions and Population Comparisons. Ecology and Evolution, 16(2), e73111. https://doi.org/10.1002/ece3.73111

Orangutan

Table of Contents

What Orangutan does

Installation

Stable version (CRAN)

Development version (GitHub)

Implementation

Description of run_orangutan() arguments

Input data format

HTML Report

Contributing / Support

Citation

Copy Link

Version

Install

Monthly Downloads

Version

License

Issues

Pull Requests

Stars

Forks

Repository

Maintainer

Last Published

Functions in Orangutan (2.1.0)