stri_split_lines(str, n_max = -1L, omit_empty = FALSE)stri_split_lines1(str)
stri_split_lines
)
or a single string (stri_split_lines1
)stri_split_lines
only]stri_split_lines
only]stri_split_lines
returns a list of character vectors.
If any input string is NA
, then the corresponding list element
is a single NA
string.stri_split_lines1(str)
is equivalent to
stri_split_lines(str[1])[[1]]
(with default parameters),
thus it returns a character vector. Moreover, if the input string ends at
a newline sequence, the last empty string is omitted from the result.
Therefore, this function is convenient for splitting a loaded text file
into text lines.
str
, n_max
, and omit_empty
.If n_max
is negative (default), then all pieces are extracted.
omit_empty
is applied during splitting: if set to TRUE
,
then empty strings will never appear in the resulting vector.
Newlines are represented on different platforms e.g. by carriage return
(CR, 0x0D), line feed (LF, 0x0A), CRLF, or next line (NEL, 0x85).
Moreover, the Unicode Standard defines two unambiguous separator characters,
Paragraph Separator (PS, 0x2029) and Line Separator (LS, 0x2028).
Sometimes also vertical tab (VT, 0x0B) and form feed (FF, 0x0C)
are used. These functions follow UTR#18 rules, where a newline sequence
corresponds to the following regular expression:
(?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}]
.
Each match is used to split a text line.
For efficiency reasons, the search is not performed via regexes here,
however.
Unicode Regular Expressions -- Unicode Technical Standard #18,
stri_split_boundaries
;
stri_split_charclass
;
stri_split_coll
;
stri_split_fixed
;
stri_split_regex
; stri_split
;
stringi-search