powered by
Split section paragraph tags into a table with subsection titles and sentences using tokenize_sentences
tokenize_sentences
pmc_text(doc, sentence = TRUE)
a tibble with section, paragraph and sentence number and text
xml_document from PubMed Central
xml_document
split paragraphs into sentences, default TRUE
Chris Stubben
# doc <- pmc_xml("PMC2231364") doc <- xml2::read_xml(system.file("extdata/PMC2231364.xml", package = "tidypmc" )) txt <- pmc_text(doc) txt dplyr::count(txt, section, sort = TRUE)
Run the code above in your browser using DataLab