Learn R Programming

vald.extractor

Robust Pipeline for VALD ForceDecks Data Extraction and Analysis

vald.extractor extends the valdr package by providing a production-ready, fault-tolerant pipeline for extracting, cleaning, and visualizing VALD ForceDecks data across multiple sports. Designed for CRAN submission with comprehensive documentation and enterprise-grade error handling.

The Problem

Organizations using VALD ForceDecks face three critical challenges:

  1. API Stability: Manual exports or large data pulls frequently timeout, causing incomplete datasets
  2. Data Cleaning: Team/sport names are inconsistent ("Football" vs "Soccer" vs "FSI"), requiring hours of manual categorization
  3. Code Duplication: Analyzing multiple test types (CMJ, DJ, ISO) requires duplicate code for each metric suffix

The Solution

vald.extractor solves these problems through:

  • Chunked Batch Processing: Extracts data in manageable chunks (100 tests at a time) with fault-tolerant error handling
  • Automated Sports Taxonomy: Regex-based pattern matching standardizes inconsistent naming conventions
  • Generic Programming: Strips test-type suffixes to enable writing analysis code once that works for all tests

Installation

# Install from CRAN (when available)
install.packages("vald.extractor")

# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")

Quick Start

library(vald.extractor)

# 1. Set VALD credentials
valdr::set_credentials(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id",
  region        = "aue"
)

# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100
)

# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id"
)

athlete_metadata <- standardize_vald_metadata(
  profiles = metadata$profiles,
  groups   = metadata$groups
)

# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)

# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)

# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)

cmj_data <- test_datasets$CMJ  # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data  <- test_datasets$DJ   # Same column names enable generic analysis

# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))

# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")

Key Features

1. Fault-Tolerant Batch Extraction

# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100,  # Adjust based on API performance
  verbose = TRUE
)

# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debugging

Why it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.

2. Automated Sports Taxonomy

metadata <- classify_sports(metadata, group_col = "all_group_names")

# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"

# After:
# "Football", "Football", "Football", "Basketball", "Basketball"

table(metadata$sports_clean)
#> Football    Basketball    Cricket    Swimming    Track & Field
#>      523           198        145          87              234

The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.

3. Generic Test-Type Analysis

# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
  test_data %>%
    mutate(
      asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
                  ((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
    )
}

# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry  <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)

DRY Principle: Without suffix removal, you'd need separate code for PEAK_FORCE_Left_CMJ, PEAK_FORCE_Left_DJ, etc. This package enables true generic programming.

4. Metadata Patching

# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
  data = cmj_data,
  patch_file = "corrections.xlsx",
  fields_to_patch = c("sex", "dateOfBirth")
)

# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After:  Male: 470, Female: 405, Unknown: 0

5. Publication-Ready Visualizations

# Longitudinal trends
plot_vald_trends(
  data = cmj_data,
  metric_col = "JUMP_HEIGHT_Both",
  group_col = "profileId",
  facet_col = "sports"
)

# Cross-sectional comparisons
plot_vald_compare(
  data = cmj_data,
  metric_col = "PEAK_FORCE_Both",
  group_col = "sports",
  fill_col = "sex"
)

Documentation

Production Use Cases

vald.extractor is designed for:

  • Multi-Sport Organizations: National sport institutes, university athletic departments, professional academies
  • Longitudinal Research: Track athlete development over months/years with automated weekly updates
  • Cross-Sectional Studies: Compare performance across sports, sexes, age groups
  • Clinical Settings: Monitor return-to-sport progression, ACL rehab, injury risk

Comparison to Manual Workflow

TaskManual Workflowvald.extractor
Extract 5000 tests❌ API timeout errors✅ Chunked processing (15 min)
Classify 500 athletes into sports❌ 2-3 hours manual work✅ Automated (30 sec)
Analyze CMJ, DJ, ISO separately❌ Duplicate code for each✅ Generic functions
Handle missing demographics❌ Manual data entry✅ Excel patch import
Generate summary tables❌ Custom scriptssummary_vald_metrics()
Create visualizations❌ ggplot2 from scratch✅ Pre-built themes

Roadmap for R Journal Submission

The R Journal article will focus on:

  1. Technical Innovation: Chunked extraction architecture with fault tolerance
  2. Domain Contribution: Automated sports taxonomy as a time-saving tool for practitioners
  3. Software Engineering: Modular design, comprehensive testing, CRAN-compliant documentation
  4. Reproducible Research: Complete workflow from raw API to publication figures

Key Message: "Automating domain-specific data taxonomy for multi-organizational sports science"

Citation

If you use vald.extractor in published research, please cite:

Chougale PD, Ananthakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractor

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-sport-taxonomy)
  3. Add tests for new functionality
  4. Submit a pull request

Common contributions:

  • Add patterns for new sports in classify_sports()
  • Improve error messages
  • Add new visualization themes
  • Extend to other VALD devices (NordBord, DynaMo, etc.)

License

MIT License - see LICENSE file for details.

Acknowledgments

  • VALD Performance: For providing the ForceDecks API and valdr package
  • tidyverse team: For creating the tools that make this package possible
  • Sports scientists: Who provided real-world use cases and taxonomy requirements

Support


Status: Ready for CRAN submission pending:

  • Final testing on multiple VALD tenants
  • CRAN comment responses
  • Logo design (hex sticker)
  • pkgdown website deployment

Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)

Copy Link

Version

Install

install.packages('vald.extractor')

Version

0.1.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Praveen D Chougale

Last Published

February 1st, 2026

Functions in vald.extractor (0.1.1)

%>%

Pipe operator
fetch_vald_batch

Robust Batch Extraction of VALD Trials
plot_vald_compare

Boxplot Comparison of Metrics by Sport, Sex, or Team
classify_sports

Automated Sports Taxonomy Mapping
fetch_vald_metadata

Retrieve Athlete Profiles and Group Assignments
summary_vald_metrics

Dynamic Summary Table for Performance Metrics
standardize_vald_metadata

Create Unified Athlete Metadata with Group Assignments
patch_metadata

Fix Missing or Incorrect Athlete Demographics
plot_vald_trends

Time-Series Visualization of Performance Metrics
split_by_test

Generic Test-Type Splitting with Suffix Removal
global_imports

Global Imports
transform_to_wide

Create Analysis-Ready Wide-Format Dataset
vald.extractor-package

vald.extractor: Robust Pipeline for 'VALD' 'ForceDecks' Data Extraction and Analysis
calculate_age

Add Age Variable to Dataset