tokenize

Turns input into a character vector. Usually the tokenization is done purely
in C++, and never exposed to R (because that requires a copy). This function
is useful for testing, or when a file doesn't parse correctly and you want
to see the underlying tokens.

internal

The goal of 'readr' is to provide a fast and friendly way to
read rectangular data (like 'csv', 'tsv', and 'fwf').  It is designed
to flexibly parse many types of data found in the wild, while still
cleanly failing when data unexpectedly changes.

Jennifer Bryan

readr

Read Rectangular Text Data

Hadley Wickham

Jim Hester

Romain Francois

Shelby Bearrows

Posit Software, PBC 

https://github.com/mandreyel/ 

Jukka Jylänki

Mikkel Jørgensen

tokenize function

<dl><dt>file</dt>
<dd>Either a path to a file, a connection, or literal data
(either a single string or a raw vector).
Files ending in <code>.gz</code>, <code>.bz2</code>, <code>.xz</code>, or <code>.zip</code> will
be automatically uncompressed. Files starting with <code>http://</code>,
<code>https://</code>, <code>ftp://</code>, or <code>ftps://</code> will be automatically
downloaded. Remote gz files can also be automatically downloaded and
decompressed.
Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with <code><a href="/link/I()?package=readr&version=2.1.5" data-mini-rdoc="readr::I()">I()</a></code>, be a string
containing at least one new line, or be a vector containing at least one
string with a new line.
Using a value of <code><a href="/link/clipboard()?package=readr&version=2.1.5" data-mini-rdoc="readr::clipboard()">clipboard()</a></code> will read from the system clipboard.</dd>
<dt>tokenizer</dt>
<dd>A tokenizer specification.</dd>
<dt>skip</dt>
<dd>Number of lines to skip before reading data.</dd>
<dt>n_max</dt>
<dd>Optionally, maximum number of rows to tokenize.</dd></dl>

Arguments

Tokenize a file/string. — tokenize

<dl>

<dt>file</dt>
<dd>Either a path to a file, a connection, or literal data
(either a single string or a raw vector).
Files ending in <code>.gz</code>, <code>.bz2</code>, <code>.xz</code>, or <code>.zip</code> will
be automatically uncompressed. Files starting with <code>http://</code>,
<code>https://</code>, <code>ftp://</code>, or <code>ftps://</code> will be automatically
downloaded. Remote gz files can also be automatically downloaded and
decompressed.
Literal data is most useful for examples and tests. To be recognised as
literal data, the input must be either wrapped with <code><a href='https://rdrr.io/r/base/AsIs.html'>I()</a></code>, be a string
containing at least one new line, or be a vector containing at least one
string with a new line.
Using a value of <code><a href='https://rdrr.io/r/utils/clipboard.html'>clipboard()</a></code> will read from the system clipboard.</dd>


<dt>tokenizer</dt>
<dd>A tokenizer specification.</dd>


<dt>skip</dt>
<dd>Number of lines to skip before reading data.</dd>


<dt>n_max</dt>
<dd>Optionally, maximum number of rows to tokenize.</dd>

</dl>

Tokenize a file/string.

tokenize: Tokenize a file/string.

Description

Usage

Arguments

Examples