stri_split_lines(str, n_max = -1L, omit_empty = FALSE)stri_split_lines1(str)
stri_split_lines
returns a list of character
vectors. If any input string is NA
, then the
corresponding list element is a NA
string.stri_split_lines1(str)
is like
stri_split_lines(str[1])[[1]]
(with default
parameters), thus it returns a character vector. Moreover,
if the input string ends at a newline sequence, the last
empty string is omitted from the result. Therefore, this
function is convenient for splitting a loaded text file
into lines.
str
, pattern
, n_max
,
and omit_empty
.If n_max
is negative (default), then all pieces are
extracted.
omit_empty
is applied during splitting: if set to
TRUE
, then empty strings will never appear in the
resulting vector.
Newlines are represented on different platforms e.g. by carriage return (CR, 0x0D), line feed (LF, 0x0A), CRLF, or next line (NEL, 0x85). Moreover, the Unicode Standard defines two unambiguous separator characters, Paragraph Separator (PS, 0x2029) and Line Separator (LS, 0x2028). Sometimes also vertical tab (VT, 0x0B) and form feed (FF, 0x0C) are used.
This function follows UTR#18 rules, where a newline
sequence corresponds to the following regular expression:
(?:\u{D A}|(?!\u{D
A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}]
.
Each match is used to split a text line. Of course, the
search is not performed via regexes here, for efficiency
reasons.
Unicode Regular Expressions -- Unicode Technical
Standard #18,
stri_split_charclass
,
stri_split_charclass
;
stri_split_fixed
,
stri_split_fixed
;
stri_split_regex
,
stri_split_regex
; stri_split
;
stringi-search