split_into_sections: Split document text into sections
Description
Splits extracted text into logical sections (Introduction, Methods, Results, etc.)
using either the PDF's table of contents or common academic section patterns.
Usage
split_into_sections(text, file_path = NULL)
Value
Named list where each element is a section's text. Always includes
"Full_text" element with complete document.
Arguments
text
Character string. Full text of the document.
file_path
Character string or NULL. Path to PDF file for TOC extraction.
If NULL, uses common section names. Default is NULL.
Details
The function attempts to:
Extract section names from PDF table of contents
Fall back to common academic section names if TOC unavailable
Match section headers in text using regex patterns
Handle duplicate section names
Common sections searched: Abstract, Introduction, Methods, Results,
Discussion, Conclusion, References, etc.