match

0th

Percentile

Value Matching

match returns a vector of the positions of (first) matches of its first argument in its second.

%in% is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand.

Keywords
manip, logic
Usage
match(x, table, nomatch = NA_integer_, incomparables = NULL)

x %in% table

Arguments
x

vector or NULL: the values to be matched. Long vectors are supported.

table

vector or NULL: the values to be matched against. Long vectors are not supported.

nomatch

the value to be returned in the case when no match is found. Note that it is coerced to integer.

incomparables

a vector of values that cannot be matched. Any value in x matching a value in this vector is assigned the nomatch value. For historical reasons, FALSE is equivalent to NULL.

Details

%in% is currently defined as "%in%" <- function(x, table) match(x, table, nomatch = 0) > 0

Factors, raw vectors and lists are converted to character vectors, and then x and table are coerced to a common type (the later of the two types in R's ordering, logical < integer < numeric < complex < character) before matching. If incomparables has positive length it is coerced to the common type.

Matching for lists is potentially very slow and best avoided except in simple cases.

Exactly what matches what is to some extent a matter of definition. For all types, NA matches NA and no other value. For real and complex values, NaN values are regarded as matching any other NaN value, but not matching NA, where for complex x, real and imaginary parts must match both (unless containing at least one NA).

Character strings will be compared as byte sequences if any input is marked as "bytes", and otherwise are regarded as equal if they are in different encodings but would agree when translated to UTF-8 (see Encoding).

That %in% never returns NA makes it particularly useful in if conditions.

Value

A vector of the same length as x.

match: An integer vector giving the position in table of the first match if there is a match, otherwise nomatch.

If x[i] is found to equal table[j] then the value returned in the i-th position of the return value is j, for the smallest possible j. If no match is found, the value is nomatch.

%in%: A logical vector, indicating if a match was located for each element of x: thus the values are TRUE or FALSE and never NA.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

pmatch and charmatch for (partial) string matching, match.arg, etc for function argument matching. findInterval similarly returns a vector of positions, but finds numbers within intervals, rather than exact matches.

is.element for an S-compatible equivalent of %in%.

unique (and duplicated) are using the same definitions of “match” or “equality” as match(), and these are less strict than ==, e.g., for NA and NaN in numeric or complex vectors, or for strings with different encodings, see also above.

Aliases
  • match
  • %in%
Examples
library(base) # NOT RUN { ## The intersection of two sets can be defined via match(): ## Simple version: ## intersect <- function(x, y) y[match(x, y, nomatch = 0)] intersect # the R function in base is slightly more careful intersect(1:10, 7:20) 1:10 %in% c(1,3,5,9) sstr <- c("c","ab","B","bba","c",NA,"@","bla","a","Ba","%") sstr[sstr %in% c(letters, LETTERS)] "%w/o%" <- function(x, y) x[!x %in% y] #-- x without y (1:10) %w/o% c(3,7,12) ## Note that setdiff() is very similar and typically makes more sense: c(1:6,7:2) %w/o% c(3,7,12) # -> keeps duplicates setdiff(c(1:6,7:2), c(3,7,12)) # -> unique values ## Illuminating example about NA matching r <- c(1, NA, NaN) zN <- c(complex(real = NA , imaginary = r ), complex(real = r , imaginary = NA ), complex(real = r , imaginary = NaN), complex(real = NaN, imaginary = r )) zM <- cbind(Re=Re(zN), Im=Im(zN), match = match(zN, zN)) rownames(zM) <- format(zN) zM ##--> many "NA's" (= 1) and the four non-NA's (3 different ones, at 7,9,10) length(zN) # 12 unique(zN) # the "NA" and the 3 different non-NA NaN's stopifnot(identical(unique(zN), zN[c(1, 7,9,10)])) ## very strict equality would have 4 duplicates (of 12): symnum(outer(zN, zN, Vectorize(identical,c("x","y")), FALSE,FALSE,FALSE,FALSE)) ## removing "(very strictly) duplicates", i <- c(5,8,11,12) # we get 8 pairwise non-identicals : Ixy <- outer(zN[-i], zN[-i], Vectorize(identical,c("x","y")), FALSE,FALSE,FALSE,FALSE) stopifnot(identical(Ixy, diag(8) == 1)) # }
Documentation reproduced from package base, version 3.5.1, License: Part of R 3.5.1

Community examples

richie@datacamp.com at Jan 17, 2017 base v3.3.2

`%in%` stops you having to do lots of `==` comparisons. Instead of writing something clunky like: ```{r} x <- sample(letters, 1) if(x == "a" || x == "e" || x == "i" || x == "o" || x == "u") { message("x is a vowel") } else { message("x is a consonant") } ``` …you can do: ```{r} if(x %in% c("a", "e", "i", "o", "u")) { message("x is a vowel") } else { message("x is a consonant") } ``` `%in%` accepts vectors for both the left- and right-hand sides. The output has the same length as the left-hand side. ```{r} letters %in% c("a", "e", "i", "o", "u") c("a", "e", "i", "o", "u") %in% letters ``` `%in%` also works with numbers. ```{r} seq.int(0.25, 5, 0.25) %in% 1:5 ``` `NA` is just another value to be matched. ```{r} NA %in% 1 1 %in% NA NA %in% NA ``` `match()` gives you the positions of the matches rather than just `TRUE` or `FALSE`. (Think [`grep()`]()https://www.rdocumentation.org/packages/base/topics/grep) vs. `grepl()`.) ```{r} match(c("a", "e", "i", "o", "u"), letters) match(letters, c("a", "e", "i", "o", "u")) ``` You can change the value that is returned when there isn't a match. Zero is a common choice. ```{r} match(letters, c("a", "e", "i", "o", "u"), nomatch = 0) ``` You can force some values to always fail to match by specifying them as incomparable. (It is usually easier change the table argument though.) ```{r} match(letters, c("a", "e", "i", "o", "u"), incomparables = "a") ``` The set manipulation functions [`intersect()`](https://www.rdocumentation.org/packages/base/topics/sets) and `setdiff()` are based upon match. Here are some simplified implementations. ```{r} intersect1 <- function(x, y) { y[match(x, y, nomatch = 0)] } setdiff1 <- function(x, y) { x[match(x, y, nomatch = 0L) == 0L] } odds <- c(1, 3, 5, 7, 9) primes <- c(2, 3, 5, 7, 11) intersect1(odds, primes) setdiff1(odds, primes) setdiff1(primes, odds) ```