substr_ctl
is a drop-in replacement for substr
. Performance is
slightly slower than substr
. ANSI CSI SGR sequences will be included in
the substrings to reflect the format of the substring when it was embedded in
the source string. Additionally, other Control Sequences specified in
ctl
are treated as zero-width.
substr_ctl(x, start, stop, warn = getOption("fansi.warn"),
term.cap = getOption("fansi.term.cap"), ctl = "all")substr2_ctl(x, start, stop, type = "chars", round = "start",
tabs.as.spaces = getOption("fansi.tabs.as.spaces"),
tab.stops = getOption("fansi.tab.stops"),
warn = getOption("fansi.warn"),
term.cap = getOption("fansi.term.cap"), ctl = "all")
substr_sgr(x, start, stop, warn = getOption("fansi.warn"),
term.cap = getOption("fansi.term.cap"))
substr2_sgr(x, start, stop, type = "chars", round = "start",
tabs.as.spaces = getOption("fansi.tabs.as.spaces"),
tab.stops = getOption("fansi.tab.stops"),
warn = getOption("fansi.warn"),
term.cap = getOption("fansi.term.cap"))
a character vector or object that can be coerced to character.
integer. The first element to be replaced.
integer. The last element to be replaced.
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions fansi
makes about how strings are rendered on your display
to be incorrect, for example by moving the cursor (see fansi).
character a vector of the capabilities of the terminal, can
be any combination "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with
"38;2" or "48;2"). Changing this parameter changes how fansi
interprets
escape sequences, so you should ensure that it matches your terminal
capabilities. See term_cap_test for details.
character, which Control Sequences should be treated specially. See the "_ctl vs. _sgr" section for details.
"nl": newlines.
"c0": all other "C0" control characters (i.e. 0x01-0x1f, 0x7F), except for newlines and the actual ESC (0x1B) character.
"sgr": ANSI CSI SGR sequences.
"csi": all non-SGR ANSI CSI sequences.
"esc": all other escape sequences.
"all": all of the above, except when used in combination with any of the above, in which case it means "all but".
character(1L) partial matching c("chars", "width")
, although
type="width"
only works correctly with R >= 3.2.2.
character(1L) partial matching
c("start", "stop", "both", "neither")
, controls how to resolve
ambiguities when a start
or stop
value in "width" type
mode falls
within a multi-byte character or a wide display character. See details.
FALSE (default) or TRUE, whether to convert tabs to
spaces. This can only be set to TRUE if strip.spaces
is FALSE.
integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line.
The *_ctl
versions of the functions treat all Control Sequences specially
by default. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, fansi
will also parse, interpret,
and reapply the text styles they encode if needed. You can modify whether a
Control Sequence is treated specially with the ctl
parameter. You can
exclude a type of Control Sequence from special treatment by combining
"all" with that type of sequence (e.g. ctl=c("all", "nl")
for special
treatment of all Control Sequences but newlines). The *_sgr
versions
only treat ANSI CSI SGR sequences specially, and are equivalent to the
*_ctl
versions with the ctl
parameter set to "sgr".
substr2_ctl
and substr2_sgr
add the ability to retrieve substrings based
on display width, and byte width in addition to the normal character width.
substr2_ctl
also provides the option to convert tabs to spaces with
tabs_as_spaces prior to taking substrings.
Because exact substrings on anything other than character width cannot be
guaranteed (e.g. as a result of multi-byte encodings, or double display-width
characters) substr2_ctl
must make assumptions on how to resolve provided
start
/stop
values that are infeasible and does so via the round
parameter.
If we use "start" as the round
value, then any time the start
value corresponds to the middle of a multi-byte or a wide character, then
that character is included in the substring, while any similar partially
included character via the stop
is left out. The converse is true if we
use "stop" as the round
value. "neither" would cause all partial
characters to be dropped irrespective whether they correspond to start
or
stop
, and "both" could cause all of them to be included.
These functions map string lengths accounting for ANSI CSI SGR sequence
semantics to the naive length calculations, and then use the mapping in
conjunction with base::substr()
to extract the string. This concept is
borrowed directly from G<U+00E1>bor Cs<U+00E1>rdi's crayon
package, although the
implementation of the calculation is different.
fansi for details on how Control Sequences are interpreted, particularly if you are getting unexpected results.
# NOT RUN {
substr_ctl("\033[42mhello\033[m world", 1, 9)
substr_ctl("\033[42mhello\033[m world", 3, 9)
## Width 2 and 3 are in the middle of an ideogram as
## start and stop positions respectively, so we control
## what we get with `round`
cn.string <- paste0("\033[42m", "\u4E00\u4E01\u4E03", "\033[m")
substr2_ctl(cn.string, 2, 3, type='width')
substr2_ctl(cn.string, 2, 3, type='width', round='both')
substr2_ctl(cn.string, 2, 3, type='width', round='start')
substr2_ctl(cn.string, 2, 3, type='width', round='stop')
## the _sgr variety only treat as special CSI SGR,
## compare the following:
substr_sgr("\033[31mhello\tworld", 1, 6)
substr_ctl("\033[31mhello\tworld", 1, 6)
substr_ctl("\033[31mhello\tworld", 1, 6, ctl=c('all', 'c0'))
# }
Run the code above in your browser using DataLab