stringr v1.4.0

0

Monthly downloads

0th

Percentile

Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Readme

stringr

CRAN
status Travis build
status AppVeyor Build
Status Codecov test
coverage Lifecycle:
stable

Overview

Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provide a cohesive set of functions designed to make working with strings as easy as possible. If you’re not familiar with strings, the best place to start is the chapter on strings in R for Data Science.

stringr is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you’ve mastered stringr, you should find stringi similarly easy to use.

Installation

# Install the released version from CRAN:
install.packages("stringr")

# Install the cutting edge development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/stringr")

Cheatsheet

Usage

All functions in stringr start with str_ and take a vector of strings as the first argument.

x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x) 
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"

Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression "[aeiou]" matches any single character that is a vowel:

str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4

There are seven main verbs that work with patterns:

  • str_detect(x, pattern) tells you if there’s any match to the pattern.

    str_detect(x, "[aeiou]")
    #> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
    
  • str_count(x, pattern) counts the number of patterns.

    str_count(x, "[aeiou]")
    #> [1] 0 3 1 2 2 4
    
  • str_subset(x, pattern) extracts the matching components.

    str_subset(x, "[aeiou]")
    #> [1] "video"     "cross"     "extra"     "deal"      "authority"
    
  • str_locate(x, pattern) gives the position of the match.

    str_locate(x, "[aeiou]")
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     2   2
    #> [3,]     3   3
    #> [4,]     1   1
    #> [5,]     2   2
    #> [6,]     1   1
    
  • str_extract(x, pattern) extracts the text of the match.

    str_extract(x, "[aeiou]")
    #> [1] NA  "i" "o" "e" "e" "a"
    
  • str_match(x, pattern) extracts parts of the match defined by parentheses.

    # extract the characters on either side of the vowel
    str_match(x, "(.)[aeiou](.)")
    #>      [,1]  [,2] [,3]
    #> [1,] NA    NA   NA  
    #> [2,] "vid" "v"  "d" 
    #> [3,] "ros" "r"  "s" 
    #> [4,] NA    NA   NA  
    #> [5,] "dea" "d"  "a" 
    #> [6,] "aut" "a"  "t"
    
  • str_replace(x, pattern, replacement) replaces the matches with new text.

    str_replace(x, "[aeiou]", "?")
    #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
    
  • str_split(x, pattern) splits up a string into multiple pieces.

    str_split(c("a,b", "c,d,e"), ",")
    #> [[1]]
    #> [1] "a" "b"
    #> 
    #> [[2]]
    #> [1] "c" "d" "e"
    

As well as regular expressions (the default), there are three other pattern matching engines:

  • fixed(): match exact bytes
  • coll(): match human letters
  • boundary(): match boundaries

RStudio Addin

The RegExplain RStudio addin provides a friendly interface for working with regular expressions and functions from stringr. This addin allows you to interactively build your regexp, check the output of common string matching functions, consult the interactive help pages, or use the included resources to learn regular expressions.

This addin can easily be installed with devtools:

# install.packages("devtools")
devtools::install_github("gadenbuie/regexplain")

Compared to base R

R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.

  • Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe:

    letters %>%
      .[1:10] %>% 
      str_pad(3, "right") %>%
      str_c(letters[2:11])
    #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
    
  • Simplifies string operations by eliminating options that you don’t need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs.

Functions in stringr

Name Description
modifiers Control matching behaviour with modifier functions.
str_pad Pad a string.
str_count Count the number of matches in a string.
str_replace_na Turn NA into "NA"
str_split Split up a string into pieces.
str_detect Detect the presence or absence of a pattern in a string.
str_conv Specify the encoding of a string.
str_c Join multiple strings into a single string.
str_starts Detect the presence or absence of a pattern at the beginning or end of a string.
str_remove Remove matched patterns in a string.
str_sub Extract and replace substrings from a character vector.
str_locate Locate the position of patterns in a string.
str_replace Replace matched patterns in a string.
str_dup Duplicate and concatenate strings within a character vector.
str_match Extract matched groups from a string.
str_extract Extract matching patterns from a string.
case Convert case of a string.
str_flatten Flatten a string
str_glue Format and interpolate a string with glue
str_wrap Wrap strings into nicely formatted paragraphs.
invert_match Switch location of matches to location of non-matches.
str_interp String interpolation.
str_length The length of a string.
str_trunc Truncate a character string.
str_subset Keep strings matching a pattern, or find positions.
str_view View HTML rendering of regular expression match.
stringr-data Sample character vectors for practicing string manipulations.
str_trim Trim whitespace from a string
stringr-package stringr: Simple, Consistent Wrappers for Common String Operations
word Extract words from a sentence.
str_order Order or sort a character vector.
%>% Pipe operator
No Results!

Vignettes of stringr

Name
releases/stringr-1.0.0.Rmd
releases/stringr-1.1.0.Rmd
releases/stringr-1.2.0.Rmd
regular-expressions.Rmd
stringr.Rmd
No Results!

Last month downloads

Details

License GPL-2 | file LICENSE
URL http://stringr.tidyverse.org, https://github.com/tidyverse/stringr
BugReports https://github.com/tidyverse/stringr/issues
VignetteBuilder knitr
Encoding UTF-8
LazyData true
RoxygenNote 6.1.1
NeedsCompilation no
Packaged 2019-02-09 16:03:19 UTC; hadley
Repository CRAN
Date/Publication 2019-02-10 03:40:03 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/stringr)](http://www.rdocumentation.org/packages/stringr)