Learn R Programming

LexisNexisTools (version 0.1.2)

lnt_checkFiles: Check LexisNexis TXT files

Description

Read a LexisNexis TXT file and check consistency.

Usage

lnt_checkFiles(x, encoding = "UTF-8",
  start_keyword = "\\d+ of \\d+ DOCUMENTS$| Dokument \\d+ von \\d+$",
  end_keyword = "^LANGUAGE: |^SPRACHE: ",
  length_keyword = "^LENGTH: |^LNGE: ", verbose = TRUE)

Arguments

x

Name or names of LexisNexis TXT file to be converted.

encoding

Encoding to be assumed for input files. Defaults to UTF-8 (the LexisNexis standard value).

start_keyword, end_keyword, length_keyword
verbose

A logical flag indicating whether information should be printed to the screen.

Details

The output will contain three tests: - test1: Indicates whether the number of beginnings and the number of ends match in a file. It is critical, that this is TRUE. Otherwise lnt_read will not be able to separate individual articles from each other. - test2: Indicates whether the number of beginnings and the number of lengths match. As 'LENGTH' is used to separate metadata from actual articles, it is critical, that this is TRUE. Otherwise lnt_read will fail with an error when trying to read this file. - test3: Indicates whether the number of beginnings equals the number of articles LexisNexis delivered. It is most likely not a problem if this is FALSE, as some articles from nexis are empty and therefore get deleted. So far, this has only been the case when an article contained a photo and nothing else.

Can check consistency of LexisNexis txt files. lnt_read needs at least Beginning, End and length in each article to work.

Examples

Run this code
# NOT RUN {
# Copy sample file to current wd
lnt_sample()

# Search for txt files in working directory
my_files<-list.files(pattern = ".TXT",
                     full.names = TRUE,
                     recursive = TRUE,
                     ignore.case = TRUE)
# Test consistency of files
checks.df <- lnt_checkFiles(my_files)
checks.df
# }

Run the code above in your browser using DataLab