stri_wrap: Word Wrap Text to Format Paragraphs

Description

This function breaks text paragraphs into lines, each consisting of at most width code points.

Usage

stri_wrap(str, width = floor(0.9 * getOption("width")), cost_exponent = 2,
  simplify = TRUE, normalize = FALSE, locale = NULL)

Arguments

str

character vector of strings to reformat

width

single positive integer giving the desired maximal number of code points per line

cost_exponent

single numeric value, values not greater than zero will select a greedy word-wrapping algorithm; otherwise this value denotes the exponent in the cost function of a (more aesthetic) dynamic programming-based algorithm (values in [2, 3] are recommended)

simplify

single logical value, see Value

normalize

single logical value, see Details

locale

NULL or "" for text boundary analysis following the conventions of the default locale, or a single string with locale identifier, see stringi-locale

Value

If simplify is TRUE, then a character vector is returned. Otherwise, you will get a list of length(str) character vectors.

Details

Vectorized over str.

ICU's line-BreakIterator is used to determine text boundaries at which a line break is possible. This is a locale-dependent operation. Note that Unicode code points may have various widths when printed on screen. This function acts like each code point is of width 1. This function should rather be used with text in Latin script.

If normalize is FALSE (the default), then multiple white spaces between the word boundaries are preserved withing each wrapped line. In such a case, none of the strings can contain \r, \n, or other new line characters, otherwise you will get at error. You should split the input text into lines or e.g. substitute line breaks with spaces before applying this function.

On the other hand, if normalize is TRUE, then all consecutive white space sequences are replaced with single spaces, by calling i.a. stri_trim(stri_replace_all_charclass(str, "\\p{WHITE_SPACE}", " ", merge=TRUE)) before actual string wrapping. Moreover, stri_split_lines and stri_trans_nfc is called on the input character vector.

The greedy algorithm (for cost_exponent being non-positive) provides a very simple way for word wrapping. It always puts as many words in each line as possible. This method -- contrary to the dynamic algorithm -- does not minimize the number of space left at the end of every line. The dynamic algorithm (a.k.a. Knuth's word wrapping algorithm) is more complex, but it returns text wrapped in a more aesthetic way. This method minimizes the squared (by default, see cost_exponent) number of spaces (raggedness) at the end of each line, so the text is mode arranged evenly.

References

D.E. Knuth, M.F. Plass, Breaking paragraphs into lines, Software: Practice and Experience 11(11), 1981, pp. 1119--1184

Other locale_sensitive: %s!==%, %s!=%, %s<=%< a="">, %s<%< a="">, %s===%, %s==%, %s>=%, %s>%, %stri!==%, %stri!=%, %stri<=%< a="">, %stri<%< a="">, %stri===%, %stri==%, %stri>=%, %stri>%; stri_cmp, stri_cmp_eq, stri_cmp_equiv, stri_cmp_ge, stri_cmp_gt, stri_cmp_le, stri_cmp_lt, stri_cmp_neq, stri_cmp_nequiv, stri_compare; stri_count_boundaries, stri_count_words; stri_duplicated, stri_duplicated_any; stri_enc_detect2; stri_extract_words; stri_locate_boundaries, stri_locate_words; stri_opts_collator; stri_order, stri_sort; stri_split_boundaries; stri_trans_tolower, stri_trans_totitle, stri_trans_toupper; stri_unique; stringi-locale; stringi-search-boundaries; stringi-search-coll

Other text_boundaries: stri_count_boundaries, stri_count_words; stri_extract_words; stri_locate_boundaries, stri_locate_words; stri_opts_brkiter; stri_split_boundaries; stri_split_lines, stri_split_lines1, stri_split_lines1; stri_trans_tolower, stri_trans_totitle, stri_trans_toupper; stringi-search-boundaries; stringi-search

Examples

Run this code

s <- stri_paste(
   "Lorem ipsum dolor sit amet, consectetur adipisicing elit. Proin ",
   "nibh augue, suscipit a, scelerisque sed, lacinia in, mi. Cras vel ",
   "lorem. Etiam pellentesque aliquet tellus.")
cat(stri_wrap(s, 20, 0.0), sep="\n") # greedy
cat(stri_wrap(s, 20, 2.0), sep="\n") # dynamic
cat(stri_pad(stri_wrap(s), side='both'), sep="\n")

Run the code above in your browser using DataLab