Learn R Programming

tidypmc (version 2.0)

pmc_text: Split section paragraphs into sentences

Description

Split section paragraph tags into a table with subsection titles and sentences using tokenize_sentences

Usage

pmc_text(doc, sentence = TRUE)

Value

a tibble with section, paragraph and sentence number and text

Arguments

doc

xml_document from PubMed Central

sentence

split paragraphs into sentences, default TRUE

Author

Chris Stubben

Examples

Run this code
# doc <- pmc_xml("PMC2231364")
doc <- xml2::read_xml(system.file("extdata/PMC2231364.xml",
  package = "tidypmc"
))
txt <- pmc_text(doc)
txt
dplyr::count(txt, section, sort = TRUE)

Run the code above in your browser using DataLab