tidyselect (version 1.2.1)

where: Select variables with a function

Description

This selection helper selects the variables for which a function returns TRUE.

Usage

where(fn)

Arguments

fn

A function that returns TRUE or FALSE (technically, a predicate function). Can also be a purrr-like formula.

Examples

Selection helpers can be used in functions like dplyr::select() or tidyr::pivot_longer(). Let's first attach the tidyverse:

library(tidyverse)

# For better printing iris <- as_tibble(iris)

where() takes a function and returns all variables for which the function returns TRUE:

is.factor(iris[[4]])
#> [1] FALSE

is.factor(iris[[5]]) #> [1] TRUE

iris %>% select(where(is.factor)) #> # A tibble: 150 x 1 #> Species #> <fct> #> 1 setosa #> 2 setosa #> 3 setosa #> 4 setosa #> # i 146 more rows

is.numeric(iris[[4]]) #> [1] TRUE

is.numeric(iris[[5]]) #> [1] FALSE

iris %>% select(where(is.numeric)) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # i 146 more rows

The formula shorthand

You can use purrr-like formulas as a shortcut for creating a function on the spot. These expressions are equivalent:

iris %>% select(where(is.numeric))
#> # A tibble: 150 x 4
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3            1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> # i 146 more rows

iris %>% select(where(function(x) is.numeric(x))) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # i 146 more rows

iris %>% select(where(~ is.numeric(.x))) #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> <dbl> #> 1 5.1 3.5 1.4 0.2 #> 2 4.9 3 1.4 0.2 #> 3 4.7 3.2 1.3 0.2 #> 4 4.6 3.1 1.5 0.2 #> # i 146 more rows

The shorthand is useful for adding logic inline. Here we select all numeric variables whose mean is greater than 3.5:

iris %>% select(where(~ is.numeric(.x) && mean(.x) > 3.5))
#> # A tibble: 150 x 2
#>   Sepal.Length Petal.Length
#>          <dbl>        <dbl>
#> 1          5.1          1.4
#> 2          4.9          1.4
#> 3          4.7          1.3
#> 4          4.6          1.5
#> # i 146 more rows