nops_scan: Read Scanned NOPS Exams

Description

Read scanned NOPS exams produced with exams2nops.

Usage

nops_scan(
    images = dir(pattern = "\\.PNG$|\\.png$|\\.PDF|\\.pdf$",
      path = dir, full.names = TRUE),
    file = NULL, dir = ".",
    verbose = TRUE, rotate = FALSE, cores = NULL, n = NULL,
    density = 300,
    size = 0.029, threshold = c(0.04, 0.42), trim = 0.3, minrot = 0.002,
    string = FALSE)

Arguments

images

character. Names of the PDF/PNG images containing the scanned exams. By default all PDF/PNG images in the current working directory are used.

file

character or logical. Optional file name for the output ZIP archive containing the PNG images and the scan results. If file = FALSE no ZIP archive is created. By default a suitable name using the current time/date is used.

dir

character. Directory in which the ZIP file should be created. By default the current working directory.

verbose

logical. Should progress information be displayed?

rotate

logical. Should the input PDF/PNG images be rotated by 180 degrees first?

cores

numeric. If set to an integer mclapply is called internally using the desired number of cores to read the scanned exams in parallel.

numeric. The number of answer fields to read (in multiples of 5), i.e., 5, 10, …, 45. By default taken from the type field.

density

numeric. Resolution used in the conversion of PDF images to PNG. This requires ImageMagick's convert to be available on the system.

size

numeric. Size of the boxes containing the check marks relative to the image height. This can be tweaked somewhat but should typically be between 0.23 and 0.31.

threshold

numeric. Vector of thresholds for the gray levels in the check mark boxes. If the average gray level is between the gray levels, the box is checked. If it is above the second threshold, some heuristic is employed for judging whether the box contains a cross or not.

trim

numeric. Amount of trimming to shave the borders of the boxes before determining the gray level within the check boxes. Should usually be at least 0.25 (default up to version 2.3-1), currently defaults to 0.3

minrot

numeric. Minimum angle for rotating images, i.e., images with a lower angle are considered to be ok.

string

logical. Are the files to be scanned manually marked string exercises (rather than single/multiple choice exercises)?

Value

A character vector with one element per scanned file (returned invisily if written to an output ZIP archive). The output contains the following space-separated information: file name, sheet ID (11 digits), scrambling (2 digits), type of sheet (3 digits, number of questions rounded up to steps of 5), 0/1 indicator whether the replacement sheet was used, registration number (7 digits), 45 multiple choice answers of length 5 (all 00000 if unused).

Details

nops_scan is a companion function for exams2nops. Exams generated with exams2nops can be printed and the filled out answer page can be scanned. Then, nops_scan can be employed to read the information in the scanned PDF/PNG images. The results are one text line per image containing the information in a very simple space-separated format.

If images only contains PNG files, then the R function readPNG is sufficient for reading the images into R. If images contains PDF files, these need to be converted to PNG first which requires PDFTk, GhostScript, and ImageMagick's convert to be available on the system. On Linux(-esque) systems this is typically easy to install by pdftk and imagemagick. The download links for Windows are: http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/pdftk_free-2.02-win-setup.exe, http://www.imagemagick.org/script/download.php#windows, http://www.ghostscript.com/download/gsdnld.html.

Practical recommendations:

The scanned images produced by scanners or copying machines typically become smaller in size if the images are read in just black/white (or grayscale). This may sometimes even improve the reliability of reading the images afterwards.

The printed exams are often stapled in the top left corner which has to be unhinged somehow by the exam participants. Although this may damage the exam sheet, this is usually no problem for scanning it. However, the copying machine's sheet feeder may work better if the sheets are turned upside down (so that the damaged corner is not fed first into the machine). This often improves the scanning results considerably and can be accomodated by setting rotate = TRUE in nops_scan.

Examples

Run this code

# NOT RUN {
## scanned example images stored in exams package
img <- dir(system.file("nops", package = "exams"), pattern = "nops_scan",
  full.names = TRUE)

## read content
res <- nops_scan(img, file = FALSE)
writeLines(res)
# }