Learn R Programming

tableParser (version 1.0.2)

table2stats: table2stats

Description

Extracts tabulated statistical results from documents in XML, HTML, HML, DOCX, or PDF format. The tabled content is collapsed into a text string with table2text(), which is then processed with standardStats() from the 'JATSdecoder' package. It detects most standard statistics (t, Z, chi^2, F, r, d, beta, SE, r, d, eta^2, omega^2, OR, RR, p-values), decodes encoded p-values to text and recalculates and checks p-values if possible.

Usage

table2stats(
  x,
  standardPcoding = FALSE,
  noSign2p = TRUE,
  correctComma = FALSE,
  rotate = FALSE,
  expandAbbreviations = TRUE,
  superscript2bracket = TRUE,
  stats.mode = "all",
  checkP = FALSE,
  alpha = 0.05,
  criticalDif = 0.02,
  alternative = "undirected",
  estimateZ = FALSE,
  T2t = FALSE,
  dfHandling = TRUE,
  collapse = TRUE,
  addTableName = FALSE,
  rm.na.col = TRUE
)

Value

A data.frame object with the extracted statistical standard results, recalculated p-values and a rudimentary, optional consistency check for reported p-values (if 'checkP=TRUE').

Arguments

x

Input. Either a file path to an XML, HTML, HML, DOCX, or PDF file; or a matrix object; or a vector of plain HTML-coded tables.

standardPcoding

Logical. If TRUE, and no other detection of coding is detected, then standard coding of p-values is assumed to be * for p<.05, ** for p<.01, and *** for p<.001.

noSign2p

Logical. If TRUE, imputes 'p>maximum of coded p-values' to cells that are not coded to be significant.

correctComma

Logical. If TRUE, decimal sign commas are converted to dots.

rotate

Logical. If TRUE, matrix content is parsed by column.

expandAbbreviations

Logical. If TRUE, detected abbreviations are expanded to label from table caption/footer.

superscript2bracket

Logical. If TRUE, detected superscript codings are inserted inside parentheses.

stats.mode

Select a subset of test results by p-value checkability for output. One of: c("all", "checkable", "computable", "uncomputable").

checkP

Logical. If TRUE, detected p-values and recalculated p-values will be checked for consistency.

alpha

Numeric. Defines the alpha level to be used for error assignment.

criticalDif

Numeric. Sets the absolute maximum difference in reported and recalculated p-values for error detection.

alternative

Character. Select test sidedness for recomputation of p-values from t-, r-, and beta-values. One of c("undirected", "directed"). If "directed" is specified, p-values for directed null hypotheses are added to the table but still require a manual inspection of the consistency of the direction.

estimateZ

Logical. If TRUE, detected beta-/d-values are divided by the reported standard error "SE" to estimate Z-values ("Zest") for observed beta/d and computation of p-values. Note: This is only valid if Gauss-Markov assumptions are met and a sufficiently large sample size is used. If a Z- or t-value is detected in a report of a beta-/d-coefficient with SE, no estimation will be performed, although set to TRUE.

T2t

Logical. If TRUE, capital letter T is treated as a t-statistic.

dfHandling

Logical. If TRUE, detected sample size N in the caption/footer is inserted as degrees of freedom (N-2) to r- and t-values that are reported without degrees of freedom.

collapse

Logical. If TRUE, the result is collapsed to a single data frame object. Else, a list of data frames with length = n matrices is returned.

addTableName

Logical. If TRUE, the table number is added in front of the extracted results.

rm.na.col

Logical. If TRUE, removes all columns with only NA.

See Also

get.stats for extracting statistical results from textual resources.

Examples

Run this code
## - Download example DOCX file
d<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.docx'
download.file(d,paste0(tempdir(),"/","tableExamples.docx"))

# Extract the detected statistical standard results and validate the reported and coded 
# p-values with the recalculated p-values.
table2stats(paste0(tempdir(),"/","tableExamples.docx"), checkP=TRUE, estimateZ=TRUE)

## - Download example HTML file
h<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.html'
download.file(h,paste0(tempdir(),"/","tableExamples.html"))
# Extract the detected statistical standard results and validate the reported and coded 
# p-values with the recalculated p-values.
table2stats(paste0(tempdir(),"/","tableExamples.html"), checkP=TRUE, estimateZ=TRUE)
# - Download example PDF file
# \donttest{
p<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.pdf'
download.file(p,paste0(tempdir(),"/","tableExamples.pdf"))

# Extract the detected statistical standard results and validate the reported and  
# standard coded as well as not coded p-values with the recalculated p-values.
table2stats(paste0(tempdir(),"/","tableExamples.pdf"), checkP=TRUE, estimateZ=TRUE, 
standardPcoding=TRUE, noSign2p=FALSE)
# Note: Due to the messy table extraction with tabulapdf::extract_tables(), the  
# extraction of the statistical results is less precise here.
# }

Run the code above in your browser using DataLab