Learn R Programming

tableParser (version 1.0.2)

table2text: table2text

Description

Parses tabled content from HTML-coded content, or HTML, DOCX, or PDF file to human-readable text vector. Before parsing, header lines are collapsed and connected cells are broken up.

Usage

table2text(
  x,
  unifyMatrix = TRUE,
  unifyStats = FALSE,
  expandAbbreviations = TRUE,
  superscript2bracket = TRUE,
  standardPcoding = FALSE,
  decodeP = TRUE,
  noSign2p = FALSE,
  bracketHandling = TRUE,
  dfHandling = FALSE,
  rotate = FALSE,
  correctComma = TRUE,
  na.rm = TRUE,
  addDescription = TRUE,
  unlist = FALSE,
  addTableName = TRUE
)

Value

A list with text vectors of the parsed table content by table. The text vector in each list element can be further processed with 'JATSdecoder::standardStats()' to extract and structure the statistical standard test results.

Arguments

x

A vector with HTML tables, or a single file path to an HTML, XML, HML, PDF, or DOCX file.

unifyMatrix

Logical. If TRUE, matrix cells are unified for better post-processing.

unifyStats

Logical. If TRUE, output is unified for better post-processing (e.g., "p-value"->"p").

expandAbbreviations

Logical. If TRUE, detected abbreviations are expanded to label from table caption/footnote.

superscript2bracket

Logical. If TRUE, detected superscript codings are inserted inside parentheses.

standardPcoding

Logical. If TRUE, and no other detection of coding is detected, standard coding of p-values is assumed to be * p<.05, ** p<.01, and ***p<.001.

decodeP

Logical. If TRUE, imputes the converts the detected p-value codings to text with seperator ';;' (e.g., '1.23*' -> '1.23;; p<.01')

noSign2p

Logical. If TRUE, imputes 'p>maximum of coded p-values' to cells that are not coded to be significant.

bracketHandling

Logical. If TRUE and if possible, decodes numbers in brackets.

dfHandling

Logical. If TRUE, the detected sample size N in a row is inserted as degrees of freedom (N-2) to r- and t-values that are reported without degrees of freedom (df) in that row. In ANOVA tables, the detected residual df (df2) is imputed behind the faktor df (df1).

rotate

Logical. If TRUE, matrix content is parsed by column.

correctComma

Logical. If TRUE and unifyMatrix=TRUE, decimal sign commas are converted to dots.

na.rm

Logical. If TRUE, NA cells are set to empty cells.

addDescription

Logical. If TRUE, the attributes table caption and table footnote are added in front of the extracted character content for better readability.

unlist

Logical. If TRUE, output is returned as a vector.

addTableName

Logical. If TRUE and unlist=TRUE, the table number is added in front of unlisted text lines.

Examples

Run this code
## - Download example DOCX file
d<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.docx'
download.file(d,paste0(tempdir(),"/","tableExamples.docx"))

# Parse tabled content from example file to text vectors.
table2text(paste0(tempdir(),"/","tableExamples.docx"))

## - Download example HTML file
h<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.html'
download.file(h,paste0(tempdir(),"/","tableExamples.html"))

# Parse tabled content from example file to text vectors.
table2text(paste0(tempdir(),"/","tableExamples.html"),unlist=TRUE,addDescription=TRUE)

# \donttest{
## - Download example PDF file
p<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.pdf'
download.file(p,paste0(tempdir(),"/","tableExamples.pdf"))
              
# Parse tabled content from example file to text vectors.
table2text(paste0(tempdir(),"/","tableExamples.pdf"),decodeP=TRUE,standardPcoding=TRUE)
# }

Run the code above in your browser using DataLab