table2stats: table2stats

Description

Extracts tabulated statistical results from documents in XML, HTML, HML, DOCX, or PDF format. The tabled content is collapsed into a text string with table2text(), which is then processed with standardStats() from the 'JATSdecoder' package. It detects most standard statistics (t, Z, chi^2, F, r, d, beta, SE, r, d, eta^2, omega^2, OR, RR, p-values), decodes encoded p-values to text and recalculates and checks p-values if possible.

Usage

table2stats(
  x,
  standardPcoding = FALSE,
  noSign2p = TRUE,
  correctComma = FALSE,
  rotate = FALSE,
  expandAbbreviations = TRUE,
  superscript2bracket = TRUE,
  stats.mode = "all",
  checkP = FALSE,
  alpha = 0.05,
  criticalDif = 0.02,
  alternative = "undirected",
  estimateZ = FALSE,
  T2t = FALSE,
  dfHandling = TRUE,
  collapse = TRUE,
  addTableName = FALSE,
  rm.na.col = TRUE
)

Value

A data.frame object with the extracted statistical standard results, recalculated p-values and a rudimentary, optional consistency check for reported p-values (if 'checkP=TRUE').

Arguments

x: Input. Either a file path to an XML, HTML, HML, DOCX, or PDF file; or a matrix object; or a vector of plain HTML-coded tables.
standardPcoding: Logical. If TRUE, and no other detection of coding is detected, then standard coding of p-values is assumed to be * for p<.05, ** for p<.01, and *** for p<.001.
noSign2p: Logical. If TRUE, imputes 'p>maximum of coded p-values' to cells that are not coded to be significant.
correctComma: Logical. If TRUE, decimal sign commas are converted to dots.
rotate: Logical. If TRUE, matrix content is parsed by column.
expandAbbreviations: Logical. If TRUE, detected abbreviations are expanded to label from table caption/footer.
superscript2bracket: Logical. If TRUE, detected superscript codings are inserted inside parentheses.
stats.mode: Select a subset of test results by p-value checkability for output. One of: c("all", "checkable", "computable", "uncomputable").
checkP: Logical. If TRUE, detected p-values and recalculated p-values will be checked for consistency.
alpha: Numeric. Defines the alpha level to be used for error assignment.
criticalDif: Numeric. Sets the absolute maximum difference in reported and recalculated p-values for error detection.
alternative: Character. Select test sidedness for recomputation of p-values from t-, r-, and beta-values. One of c("undirected", "directed"). If "directed" is specified, p-values for directed null hypotheses are added to the table but still require a manual inspection of the consistency of the direction.
estimateZ: Logical. If TRUE, detected beta-/d-values are divided by the reported standard error "SE" to estimate Z-values ("Zest") for observed beta/d and computation of p-values. Note: This is only valid if Gauss-Markov assumptions are met and a sufficiently large sample size is used. If a Z- or t-value is detected in a report of a beta-/d-coefficient with SE, no estimation will be performed, although set to TRUE.
T2t: Logical. If TRUE, capital letter T is treated as a t-statistic.
dfHandling: Logical. If TRUE, detected sample size N in the caption/footer is inserted as degrees of freedom (N-2) to r- and t-values that are reported without degrees of freedom.
collapse: Logical. If TRUE, the result is collapsed to a single data frame object. Else, a list of data frames with length = n matrices is returned.
addTableName: Logical. If TRUE, the table number is added in front of the extracted results.
rm.na.col: Logical. If TRUE, removes all columns with only NA.

Examples

Run this code

## - Download example DOCX file
d<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.docx'
download.file(d,paste0(tempdir(),"/","tableExamples.docx"))

# Extract the detected statistical standard results and validate the reported and coded 
# p-values with the recalculated p-values.
table2stats(paste0(tempdir(),"/","tableExamples.docx"), checkP=TRUE, estimateZ=TRUE)

## - Download example HTML file
h<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.html'
download.file(h,paste0(tempdir(),"/","tableExamples.html"))
# Extract the detected statistical standard results and validate the reported and coded 
# p-values with the recalculated p-values.
table2stats(paste0(tempdir(),"/","tableExamples.html"), checkP=TRUE, estimateZ=TRUE)
# - Download example PDF file
# \donttest{
p<-'https://github.com/ingmarboeschen/tableParser/raw/refs/heads/main/tableExamples.pdf'
download.file(p,paste0(tempdir(),"/","tableExamples.pdf"))

# Extract the detected statistical standard results and validate the reported and  
# standard coded as well as not coded p-values with the recalculated p-values.
table2stats(paste0(tempdir(),"/","tableExamples.pdf"), checkP=TRUE, estimateZ=TRUE, 
standardPcoding=TRUE, noSign2p=FALSE)
# Note: Due to the messy table extraction with tabulapdf::extract_tables(), the  
# extraction of the statistical results is less precise here.
# }

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples