checkHTMLdir: Extract test statistics from all HTML files in a folder.

Description

Extracts statistical references from a directory with HTML versions of articles. By default a GUI window is opened that allows you to choose the directory (using tcltk).

Usage

checkHTMLdir(dir, subdir = TRUE, extension=TRUE, ...)

Value

A data frame containing for each extracted statistic:

Source: Name of the file of which the statistic is extracted
Statistic: Character indicating the statistic that is extracted
df1: First degree of freedom
df2: Second degree of freedom (if applicable)
Test.Comparison: Reported comparison of the test statistic, when importing from pdf this will often not be converted properly
Value: Reported value of the statistic
Reported.Comparison: Reported comparison, when importing from pdf this might not be converted properly
Reported.P.Value: The reported p-value, or NA if the reported value was NS
Computed: The recomputed p-value
Raw: Raw string of the statistical reference that is extracted
Error: The computed p value is not congruent with the reported p value
DecisionError: The reported result is significant whereas the recomputed result is not, or vice versa.
OneTail: Logical. Is it likely that the reported p value resulted from a correction for one-sided testing?
OneTailedInTxt: Logical. Does the text contain the string "sided", "tailed", and/or "directional"?
CopyPaste: Logical. Does the exact string of the extracted raw results occur anywhere else in the article?

Arguments

dir: String indicating the directory to be used.
subdir: Logical indicating whether you also want to check subfolders. Defaults to TRUE
extension: Logical, indicating whether the HTML extension should be checked. Defaults to TRUE
...: Arguments sent to statcheck

Author

Sacha Epskamp <mail@sachaepskamp.com> & Michele B. Nuijten

<m.b.nuijten@uvt.nl>

Details

See statcheck for more details. Use checkHTML to import individual HTML files.

Note that the conversion to plain text and extraction of statistics can result in errors. Some statistical values can be missed, especially if the notation is unconventional. It is recommended to manually check some of the results.

Examples

Run this code


  # with this command a menu will pop up from which you can select the directory with HTML articles

  # checkHTMLdir()



# you could also specify the directory beforehand

# for instance:

# DIR <- "C:/mydocuments/articles"

# checkHTMLdir(DIR)

Run the code above in your browser using DataLab