text_read

name or path to the file to be read in or a <a rd-options="base" href="/link/connections?package=stringb&version=0.1.17&to=base" data-mini-rdoc="base::connections">connections</a> object (see <a rd-options="base" href="/link/readLines?package=stringb&version=0.1.17&to=base" data-mini-rdoc="base::readLines">readLines</a>)

file

either
NULL so that no splitting is done;
a regular expression to use to split text into parts;
or a function that does the splitting (or whatever other transformation)

tokenize

character encoding of file passed throught to <a rd-options="base" href="/link/readLines?package=stringb&version=0.1.17&to=base" data-mini-rdoc="base::readLines">readLines</a>

encoding

further arguments passed through to <a rd-options="base" href="/link/readLines?package=stringb&version=0.1.17&to=base" data-mini-rdoc="base::readLines">readLines</a> like:
n, ok, warn, skipNul

A wrapper to readLines() to make things more ordered and convenient. In
comparison to the wrapped up readLines() function text_read() does some
things differently: (1) If no encoding is given, it will always assume files
are stored in UTF-8 instead of the system locale. (2) it will always converts
text to UTF-8 instead of transforming it to the system locale. (3) in
addition to loading, it offers to tokenize the text using a regular expression
or NULL for no tokenization at all.

Base R already ships with string handling capabilities 'out-
of-the-box' but lacks streamlined function names and workflow. The
'stringi' ('stringr') package on the other hand has well named functions,
extensive Unicode support and allows for a streamlined workflow. On the other
hand it adds dependencies and regular expression interpretation between base R
functions and 'stringi' functions might differ. This packages aims at providing
a solution to the use case of unwanted dependencies on the one hand but the need
for streamlined text processing on the other. The packages' functions are solely
based on wrapping base R functions into 'stringr'/'stringi' like function names.
Along the way it adds one or two extra functions and last but not least provides
all functions as generics, therefore allowing for adding methods for other text
structures besides plain character vectors.

Peter Meissner

stringb

Convenient Base R String Handling

text_read function

name or path to the file to be read in or a <a rd-options='base' href='connections'>connections</a> object (see <a rd-options='base' href='readLines'>readLines</a>)

character encoding of file passed throught to <a rd-options='base' href='readLines'>readLines</a>

further arguments passed through to <a rd-options='base' href='readLines'>readLines</a> like:
n, ok, warn, skipNul

text_read: read in text

Description

Usage

Arguments