Learn R Programming

⚠️There's a newer version (1.0.10) of this package.Take me there.

dplyr

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

dplyr is designed to abstract over how the data is stored. That means as well as working with local data frames, you can also work with remote database tables, using exactly the same R code. Install the dbplyr package then read vignette("databases", package = "dbplyr").

If you are new to dplyr, the best place to start is the data import chapter in R for data science.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dplyr:
install.packages("dplyr")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use the manipulatr mailing list.

Usage

library(dplyr)

starwars %>% 
  filter(species == "Droid")
#> # A tibble: 5 x 13
#>   name  height  mass hair_color skin_color  eye_color birth_year gender
#>   <chr>  <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr> 
#> 1 C-3PO    167   75. <NA>       gold        yellow          112. <NA>  
#> 2 R2-D2     96   32. <NA>       white, blue red              33. <NA>  
#> 3 R5-D4     97   32. <NA>       white, red  red              NA  <NA>  
#> 4 IG-88    200  140. none       metal       red              15. none  
#> 5 BB8       NA   NA  none       none        black            NA  none  
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

starwars %>% 
  select(name, ends_with("color"))
#> # A tibble: 87 x 4
#>   name           hair_color skin_color  eye_color
#>   <chr>          <chr>      <chr>       <chr>    
#> 1 Luke Skywalker blond      fair        blue     
#> 2 C-3PO          <NA>       gold        yellow   
#> 3 R2-D2          <NA>       white, blue red      
#> 4 Darth Vader    none       white       yellow   
#> 5 Leia Organa    brown      light       brown    
#> # ... with 82 more rows

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
#> # A tibble: 87 x 4
#>   name           height  mass   bmi
#>   <chr>           <int> <dbl> <dbl>
#> 1 Luke Skywalker    172   77.  26.0
#> 2 C-3PO             167   75.  26.9
#> 3 R2-D2              96   32.  34.7
#> 4 Darth Vader       202  136.  33.3
#> 5 Leia Organa       150   49.  21.8
#> # ... with 82 more rows

starwars %>% 
  arrange(desc(mass))
#> # A tibble: 87 x 13
#>   name    height  mass hair_color skin_color  eye_color  birth_year gender
#>   <chr>    <int> <dbl> <chr>      <chr>       <chr>           <dbl> <chr> 
#> 1 Jabba …    175 1358. <NA>       green-tan,… orange          600.  herma…
#> 2 Grievo…    216  159. none       brown, whi… green, ye…       NA   male  
#> 3 IG-88      200  140. none       metal       red              15.0 none  
#> 4 Darth …    202  136. none       white       yellow           41.9 male  
#> 5 Tarfful    234  136. brown      brown       blue             NA   male  
#> # ... with 82 more rows, and 5 more variables: homeworld <chr>,
#> #   species <chr>, films <list>, vehicles <list>, starships <list>

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(n > 1)
#> # A tibble: 9 x 3
#>   species      n  mass
#>   <chr>    <int> <dbl>
#> 1 Droid        5  69.8
#> 2 Gungan       3  74.0
#> 3 Human       35  82.8
#> 4 Kaminoan     2  88.0
#> 5 Mirialan     2  53.1
#> # ... with 4 more rows

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('dplyr')

Monthly Downloads

1,541,718

Version

0.7.8

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

November 10th, 2018

Functions in dplyr (0.7.8)

all_vars

Apply predicate to all variables
compute

Force computation of a database query
distinct

Select distinct/unique rows
as.tbl_cube

Coerce an existing data structure into a tbl_cube
arrange

Arrange rows by variables
cumall

Cumulativate versions of any, all, and mean
copy_to

Copy a local data frame to a remote src
auto_copy

Copy tables to same source, if necessary
filter

Return rows with matching conditions
filter_all

Filter within a selection of variables
do

Do anything
group_by_all

Group by a selection of variables
check_dbplyr

dbplyr compatibility functions
coalesce

Find first non-missing element
backend_dbplyr

Database and SQL generics.
explain

Explain details of a tbl
bind

Efficiently bind multiple data frames by row and column
all_equal

Flexible equality comparison for data frames
failwith

Fail with specified value.
add_rownames

Convert row names to an explicit variable.
case_when

A general vectorised if
group_by_prepare

Prepare for grouping.
group_indices

Group id.
join

Join two tbls together
ident

Flag a character vector as SQL identifiers
n

The number of observations in the current group.
location

Print the location in memory of a data frame
lead-lag

Lead and lag.
desc

Descending order
n_distinct

Efficiently count the number of unique values in a set of vector
dim_desc

Describing dimensions
id

Compute a unique numeric id for each unique row in a data frame.
join.tbl_df

Join data frame tbls
order_by

A helper function for ordering window function output
na_if

Convert values to NA
band_members

Band membership
funs

Create a list of functions calls.
bench_compare

Evaluate, compare, benchmark operations of a set of srcs.
recode

Recode values
reexports

Objects exported from other packages
group_size

Calculate group sizes.
progress_estimated

Progress bar with estimated time.
between

Do values in a numeric vector fall in specified range?
nasa

NASA spatio-temporal data
group_by

Group by one or more variables
tally_

Deprecated SE versions of main verbs.
select_all

Select and rename a selection of variables
select

Select/rename variables by name
sample

Sample n rows from a table
near

Compare two numeric vectors
if_else

Vectorised if
nth

Extract the first, last or nth value from a vector
init_logging

Enable internal logging
src_dbi

Source for database backends
rowwise

Group input by rows
src_local

A local source.
scoped

Operate on a selection of variables
dplyr-package

dplyr: a grammar of data manipulation
summarise_all

Summarise and mutate multiple columns.
summarise_each

Summarise and mutate multiple columns.
same_src

Figure out if two sources are the same (or two tbl have the same source)
dr_dplyr

Dr Dplyr checks your installation for common problems.
top_n

Select top (or bottom) n rows (by value)
tbl_vars

List variables provided by a tbl.
select_vars

Select variables
grouped_df

A grouped data frame.
storms

Storm tracks data
tidyeval

Tidy eval helpers
src_tbls

List all tbls provided by a source.
vars

Select variables
starwars

Starwars characters
summarise

Reduces multiple values down to a single value
with_order

Run a function with one order, translating result back to original order
tally

Count/tally observations by group
tbl

Create a table from a data source
tbl_cube

A data cube tbl
tbl_df

Create a data frame tbl.
groups

Return grouping variables
make_tbl

Create a "tbl" object
mutate

Add new variables
pull

Pull out a single variable
ranking

Windowed rank functions.
setops

Set operations
slice

Select rows by position
sql

SQL escaping.
src

Create a "src" object
common_by

Extract out common by variables
arrange_all

Arrange rows by a selection of variables
as.table.tbl_cube

Coerce a tbl_cube to other data structures