vald.extractor
Robust Pipeline for VALD ForceDecks Data Extraction and Analysis
vald.extractor extends the valdr package by providing a production-ready, fault-tolerant pipeline for extracting, cleaning, and visualizing VALD ForceDecks data across multiple sports. Designed for CRAN submission with comprehensive documentation and enterprise-grade error handling.
The Problem
Organizations using VALD ForceDecks face three critical challenges:
- API Stability: Manual exports or large data pulls frequently timeout, causing incomplete datasets
- Data Cleaning: Team/sport names are inconsistent ("Football" vs "Soccer" vs "FSI"), requiring hours of manual categorization
- Code Duplication: Analyzing multiple test types (CMJ, DJ, ISO) requires duplicate code for each metric suffix
The Solution
vald.extractor solves these problems through:
- Chunked Batch Processing: Extracts data in manageable chunks (100 tests at a time) with fault-tolerant error handling
- Automated Sports Taxonomy: Regex-based pattern matching standardizes inconsistent naming conventions
- Generic Programming: Strips test-type suffixes to enable writing analysis code once that works for all tests
Installation
# Install from CRAN (when available)
install.packages("vald.extractor")
# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")Quick Start
library(vald.extractor)
# 1. Set VALD credentials
valdr::set_credentials(
client_id = "your_client_id",
client_secret = "your_client_secret",
tenant_id = "your_tenant_id",
region = "aue"
)
# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
start_date = "2020-01-01T00:00:00Z",
chunk_size = 100
)
# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
client_id = "your_client_id",
client_secret = "your_client_secret",
tenant_id = "your_tenant_id"
)
athlete_metadata <- standardize_vald_metadata(
profiles = metadata$profiles,
groups = metadata$groups
)
# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)
# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)
# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)
cmj_data <- test_datasets$CMJ # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data <- test_datasets$DJ # Same column names enable generic analysis
# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))
# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")Key Features
1. Fault-Tolerant Batch Extraction
# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
start_date = "2020-01-01T00:00:00Z",
chunk_size = 100, # Adjust based on API performance
verbose = TRUE
)
# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debuggingWhy it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.
2. Automated Sports Taxonomy
metadata <- classify_sports(metadata, group_col = "all_group_names")
# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"
# After:
# "Football", "Football", "Football", "Basketball", "Basketball"
table(metadata$sports_clean)
#> Football Basketball Cricket Swimming Track & Field
#> 523 198 145 87 234The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.
3. Generic Test-Type Analysis
# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
test_data %>%
mutate(
asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
)
}
# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)DRY Principle: Without suffix removal, you'd need separate code for PEAK_FORCE_Left_CMJ, PEAK_FORCE_Left_DJ, etc. This package enables true generic programming.
4. Metadata Patching
# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
data = cmj_data,
patch_file = "corrections.xlsx",
fields_to_patch = c("sex", "dateOfBirth")
)
# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After: Male: 470, Female: 405, Unknown: 05. Publication-Ready Visualizations
# Longitudinal trends
plot_vald_trends(
data = cmj_data,
metric_col = "JUMP_HEIGHT_Both",
group_col = "profileId",
facet_col = "sports"
)
# Cross-sectional comparisons
plot_vald_compare(
data = cmj_data,
metric_col = "PEAK_FORCE_Both",
group_col = "sports",
fill_col = "sex"
)Documentation
- Vignette: End-to-End Pipeline: From API to Multi-Sport Analysis
- Function Reference:
?fetch_vald_batch,?standardize_vald_metadata,?split_by_test, etc. - GitHub Repository: https://github.com/praveenmaths89/vald.extractor
Production Use Cases
vald.extractor is designed for:
- Multi-Sport Organizations: National sport institutes, university athletic departments, professional academies
- Longitudinal Research: Track athlete development over months/years with automated weekly updates
- Cross-Sectional Studies: Compare performance across sports, sexes, age groups
- Clinical Settings: Monitor return-to-sport progression, ACL rehab, injury risk
Comparison to Manual Workflow
| Task | Manual Workflow | vald.extractor |
|---|---|---|
| Extract 5000 tests | ❌ API timeout errors | ✅ Chunked processing (15 min) |
| Classify 500 athletes into sports | ❌ 2-3 hours manual work | ✅ Automated (30 sec) |
| Analyze CMJ, DJ, ISO separately | ❌ Duplicate code for each | ✅ Generic functions |
| Handle missing demographics | ❌ Manual data entry | ✅ Excel patch import |
| Generate summary tables | ❌ Custom scripts | ✅ summary_vald_metrics() |
| Create visualizations | ❌ ggplot2 from scratch | ✅ Pre-built themes |
Roadmap for R Journal Submission
The R Journal article will focus on:
- Technical Innovation: Chunked extraction architecture with fault tolerance
- Domain Contribution: Automated sports taxonomy as a time-saving tool for practitioners
- Software Engineering: Modular design, comprehensive testing, CRAN-compliant documentation
- Reproducible Research: Complete workflow from raw API to publication figures
Key Message: "Automating domain-specific data taxonomy for multi-organizational sports science"
Citation
If you use vald.extractor in published research, please cite:
Chougale PD, Ananthakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractorContributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-sport-taxonomy) - Add tests for new functionality
- Submit a pull request
Common contributions:
- Add patterns for new sports in
classify_sports() - Improve error messages
- Add new visualization themes
- Extend to other VALD devices (NordBord, DynaMo, etc.)
License
MIT License - see LICENSE file for details.
Acknowledgments
- VALD Performance: For providing the ForceDecks API and
valdrpackage - tidyverse team: For creating the tools that make this package possible
- Sports scientists: Who provided real-world use cases and taxonomy requirements
Support
- Issues: GitHub Issues
- Email: praveenmaths89@gmail.com
- Documentation: Run
vignette("end-to-end-pipeline", package = "vald.extractor")
Status: Ready for CRAN submission pending:
- Final testing on multiple VALD tenants
- CRAN comment responses
- Logo design (hex sticker)
- pkgdown website deployment
Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)