Extracts statistical references from a directory with HTML and PDF files. The "pdftotext" program is used to convert PDF files to plain text files. This must be installed and PATH variables must be properly set so that this program can be used from command line.
By default a gui window is opened that allows you to choose the directory (using tcltk).
checkdir(dir, subdir = TRUE, ...)
String indicating the directory to be used. If this is left empty, a window will pop up from which you can choose a directory.
Logical indicating whether you also want to check subfolders. Defaults to TRUE
Arguments sent to statcheck
.
A data frame containing for each extracted statistic:
Name of the file of which the statistic is extracted
Character indicating the statistic that is extracted
First degree of freedom
Second degree of freedom (if applicable)
Reported comparison of the test statistic, when importing from pdf this will often not be converted properly
Reported value of the statistic
Reported comparison, when importing from pdf this might not be converted properly
The reported p-value, or NA if the reported value was NS
The recomputed p-value
Raw string of the statistical reference that is extracted
The computed p value is not congruent with the reported p value
The reported result is significant whereas the recomputed result is not, or vice versa.
Logical. Is it likely that the reported p value resulted from a correction for one-sided testing?
Logical. Does the text contain the string "sided", "tailed", and/or "directional"?
Logical. Does the exact string of the extracted raw results occur anywhere else in the article?
See statcheck
for more details. This function is a wrapper around both checkPDFdir
for PDF files and checkHTMLdir
for HTML files.
Depending on the PDF file the comparison operators (=/</>) can sometimes not be converted correctly, causing these to not be reported in the output. Using html versions of articles is reccomended for more stable results.
Note that the conversion to plain text and extraction of statistics can result in errors. Some statistical values can be missed, especially if the notation is unconventional. It is recommended to manually check some of the results.
# NOT RUN {
# with this command a menu will pop up from which you can select the directory with articles
# checkdir()
# you could also specify the directory beforehand
# for instance:
# DIR <- "C:/mydocuments/articles"
# checkdir(DIR)
# }
Run the code above in your browser using DataLab