lnt_read

Name or names of LexisNexis TXT file to be converted.

Encoding to be assumed for input files. Defaults to UTF-8
(the LexisNexis standard value).

encoding

A logical flag indicating if the returned object
will include a third data frame with paragraphs.

extract_paragraphs

A logical flag indicating if it should be tried to convert
the date of each article into Date format. Fails for non standard dates
provided by LexisNexis so it might be safer to convert date afterwards.

convert_date

If convert_date is set to TRUE will convert all dates using
the same pattern. See <a rd-options="base" href="/link/strptime?package=LexisNexisTools&version=0.1.2&to=base" data-mini-rdoc="base::strptime">strptime</a>.

date_format

Is used to indicate the beginning of an article. All
articles need to have same number of Beginnings, ends and lengths (which
indicate the the last line of meta-data).

start_keyword

Is used to indicate the end of an article.

end_keyword

Is used to indicate the end of the meta-data.

length_keyword

A logical flag indicating whether information should be
printed to the screen.

verbose

Read a LexisNexis TXT file and convert it to a data frame.

LexisNexis

My PhD supervisor once told me that everyone doing newspaper
analysis starts by writing code to read in files from the 'LexisNexis'
newspaper archive (retrieved e.g., from <http://www.nexis.com/> or any of the
partner sites). However, while this is a nice exercise I do recommend, not
everyone has the time. This package takes TXT files downloaded from the
newspaper archive of 'LexisNexis' in Since this package takes in TXT files
which are unstructured in the sense that beginning and end of an article are
not clearly indicated, the main function lnt_read() relies on certain keywords
that signal to R where an article begins, ends and where meta-data is stored.
lnt_checkFiles() thus tests if all keywords are in place. Every article in every
TXT file should start with "X of X DOCUMENTS" and end with "LANGUAGE:". The
end of the metadata is usually indicated by "LENGTH:". Some measures were
taken to eliminate problems but where these keywords appear inside an article
or headline, test1 or test2 from the lnt_checkFiles() will return FALSE and
lnt_read() will not be able to do its job. In these cases, it is recommended to
slightly alter the TXT files, e.g. by changing a headline to "language: never
stop learning new ones" instead of "LANGUAGE: never stop learning new
ones"---as "language:" without capital letters is not picked up by the
functions.

Johannes Gruber

LexisNexisTools

Working with Files from 'LexisNexis'

lnt_read function

If convert_date is set to TRUE will convert all dates using
the same pattern. See <a rd-options='base' href='strptime'>strptime</a>.

lnt_read: Read in a LexisNexis TXT file

Description

Usage

Arguments

Value

Details

Examples