Learn R Programming

tableParser (version 1.0.2)

guessCaptionFootnote: guessCaptionFootnote

Description

Extracts text blocks around tables within DOCX, HTML, HML, XML, or NXML files in order to return the table captions and footnotes.

Usage

guessCaptionFootnote(x, MaxCaptionLength = 1, MaxFootnoteLength = 4)

Value

A list with the extracted table captions and footers as vectors of length=number of tables.

Arguments

x

character. A file path.

MaxCaptionLength

numeric. The maximum number of sentences within a text block that shall be treated as a caption. Text blocks that contain more sentences than this threshold are not extracted.

MaxFootnoteLength

numeric. The maximum number of sentences within a text block that shall be treated as a footnote. Text blocks that contain more sentences than this threshold are not extracted.

Examples

Run this code
## Download an example DOCX file from tableParser's github repo to temp directory 
d<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.docx'
download.file(d,paste0(tempdir(),"/","tableExamples.docx"))

## Download an example HTML file from tableParser's github repo to temp directory 
h<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.html'
download.file(h,paste0(tempdir(),"/","tableExamples.html"))

## Extract table captions and footnotes 
# DOCX file
guessCaptionFootnote(paste0(tempdir(),"/","tableExamples.docx"))
# HTML file
guessCaptionFootnote(paste0(tempdir(),"/","tableExamples.html"))

Run the code above in your browser using DataLab