This function applies a "documentation necessity test": only include functions
where a proficient LLM would struggle without explicit documentation and examples.
This dramatically improves output quality and reduces token waste.
The enhanced prompt applies a rigorous "documentation necessity test"
with four key questions:
1. Would a proficient LLM struggle without documentation?
2. Is this function domain-specific or universally known?
3. Does it use specialized terminology or workflows?
4. Would examples significantly improve usage accuracy?
**Automatic exclusions** (common functions that waste tokens):
- Data I/O: read.csv, write.csv, readLines
- Basic operations: order, sort, subset, head, tail
- Simple statistics: mean, median, sd, sum
- Core structures: c, list, data.frame
- Well-known tidyverse: simple dplyr::filter, dplyr::mutate
- Basic control flow: if, for, while
- Common utilities: paste, grep, unique
**What gets included** (documentation-critical functions):
- Domain-specific methods (clusterProfiler::enrichGO for GO analysis)
- Complex statistical procedures (DESeq2::DESeq)
- Specialized transformations (sf::st_transform for spatial data)
- Functions with many non-obvious parameters
- Methods where wrong usage produces plausible but incorrect results
This approach ensures that "GO enrichment analysis" returns
clusterProfiler functions, NOT read.csv or order.