Extracts main textual content from NISO-JATS coded XML file or text as sectioned text.
get.text(
x,
sectionsplit = "",
grepsection = "",
letter.convert = TRUE,
greek2text = FALSE,
sentences = FALSE,
paragraph = FALSE,
cermine = "auto",
rm.table = TRUE,
rm.formula = TRUE,
rm.xref = TRUE,
rm.media = TRUE,
rm.graphic = TRUE,
rm.ext_link = TRUE
)
List with two elements. 1: Character vector with section title/s, 2: Character vector with floating text of sections or list with vector of sentences per section/s if sentences=TRUE.
a NISO-JATS coded XML file or text.
search patterns for section split (forced to lower case), e.g. c("intro", "method", "result", "discus").
search pattern to reduce text to specific section namings only.
Logical. If TRUE converts hexadecimal and HTML coded characters to Unicode.
Logical. If TRUE some greek letters and special characters will be unified to textual representation (important to extract stats).
Logical. IF TRUE text is returned as sectioned list with sentences.
Logical. IF TRUE "<New paragraph>" is added at the end of each paragraph to enable manual splitting at paragraphs.
Logical. If TRUE CERMINE specific error handling and letter conversion will be applied. If set to "auto" file name ending with 'cermxml$' will set cermine=TRUE.
Logical. If TRUE removes <table> tag from text.
Logical. If TRUE removes <formula> tags.
Logical. If TRUE removes <xref> tag (citing) from text.
Logical. If TRUE removes <media> tag from text.
Logical. If TRUE removes <graphic> and <fig> tag from text.
Logical. If TRUE removes <ext link> tag from text.
JATSdecoder
for simultaneous extraction of meta-tags, abstract, sectioned text and reference list.