Learn R Programming

⚠️There's a newer version (1.0.10) of this package.Take me there.

dplyr

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation "by group". You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

dplyr is designed to abstract over how the data is stored. That means as well as working with local data frames, you can also work with remote database tables, using exactly the same R code. Install the dbplyr package then read vignette("databases", package = "dbplyr").

If you are new to dplyr, the best place to start is the data import chapter in R for data science.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dplyr:
install.packages("dplyr")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use the manipulatr mailing list.

Usage

library(dplyr)

starwars %>% 
  filter(species == "Droid")
#> # A tibble: 5 x 13
#>    name height  mass hair_color  skin_color eye_color birth_year gender
#>   <chr>  <int> <dbl>      <chr>       <chr>     <chr>      <dbl>  <chr>
#> 1 C-3PO    167    75       <NA>        gold    yellow        112   <NA>
#> 2 R2-D2     96    32       <NA> white, blue       red         33   <NA>
#> 3 R5-D4     97    32       <NA>  white, red       red         NA   <NA>
#> 4 IG-88    200   140       none       metal       red         15   none
#> 5   BB8     NA    NA       none        none     black         NA   none
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

starwars %>% 
  select(name, ends_with("color"))
#> # A tibble: 87 x 4
#>             name hair_color  skin_color eye_color
#>            <chr>      <chr>       <chr>     <chr>
#> 1 Luke Skywalker      blond        fair      blue
#> 2          C-3PO       <NA>        gold    yellow
#> 3          R2-D2       <NA> white, blue       red
#> 4    Darth Vader       none       white    yellow
#> 5    Leia Organa      brown       light     brown
#> # ... with 82 more rows

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
#> # A tibble: 87 x 4
#>             name height  mass      bmi
#>            <chr>  <int> <dbl>    <dbl>
#> 1 Luke Skywalker    172    77 26.02758
#> 2          C-3PO    167    75 26.89232
#> 3          R2-D2     96    32 34.72222
#> 4    Darth Vader    202   136 33.33007
#> 5    Leia Organa    150    49 21.77778
#> # ... with 82 more rows

starwars %>% 
  arrange(desc(mass))
#> # A tibble: 87 x 13
#>                    name height  mass hair_color       skin_color
#>                   <chr>  <int> <dbl>      <chr>            <chr>
#> 1 Jabba Desilijic Tiure    175  1358       <NA> green-tan, brown
#> 2              Grievous    216   159       none     brown, white
#> 3                 IG-88    200   140       none            metal
#> 4           Darth Vader    202   136       none            white
#> 5               Tarfful    234   136      brown            brown
#> # ... with 82 more rows, and 8 more variables: eye_color <chr>,
#> #   birth_year <dbl>, gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(n > 1)
#> # A tibble: 9 x 3
#>    species     n     mass
#>      <chr> <int>    <dbl>
#> 1    Droid     5 69.75000
#> 2   Gungan     3 74.00000
#> 3    Human    35 82.78182
#> 4 Kaminoan     2 88.00000
#> 5 Mirialan     2 53.10000
#> # ... with 4 more rows

Copy Link

Version

Install

install.packages('dplyr')

Monthly Downloads

1,541,718

Version

0.7.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Last Published

June 22nd, 2017

Functions in dplyr (0.7.1)

backend_dbplyr

Database and SQL generics.
band_members

Band membership
all_vars

Apply predicate to all variables
arrange_all

Arrange rows by a selection of variables
as.tbl_cube

Coerce an existing data structure into a tbl_cube
auto_copy

Copy tables to same source, if necessary
arrange

Arrange rows by variables
as.table.tbl_cube

Coerce a tbl_cube to other data structures
add_rownames

Convert row names to an explicit variable.
all_equal

Flexible equality comparison for data frames
common_by

Extract out common by variables
bench_compare

Evaluate, compare, benchmark operations of a set of srcs.
check_dbplyr

dbplyr compatiblity functions
compute

Force computation of a database query
bind

Efficiently bind multiple data frames by row and column
case_when

A general vectorised if
copy_to

Copy a local data frame to a remote src
cumall

Cumulativate versions of any, all, and mean
failwith

Fail with specified value.
filter_all

Filter within a selection of variables
filter

Return rows with matching conditions
funs

Create a list of functions calls.
group_by

Group by one or more variables
group_indices

Group id.
coalesce

Find first non-missing element
distinct

Select distinct/unique rows
do

Do anything
location

Print the location in memory of a data frame
make_tbl

Create a "tbl" object
reexports

Objects exported from other packages
rowwise

Group input by rows
group_size

Calculate group sizes.
grouped_df

A grouped data frame.
join.tbl_df

Join data frame tbls
lead-lag

Lead and lag.
between

Do values in a numeric vector fall in specified range?
desc

Descending order
dim_desc

Describing dimensions
groups

Return grouping variables
nth

Extract the first, last or nth value from a vector
order_by

A helper function for ordering window function output
scoped

Operate on a selection of variables
tally_

Deprecated SE versions of main verbs.
id

Compute a unique numeric id for each unique row in a data frame.
n

The number of observations in the current group.
na_if

Convert values to NA
progress_estimated

Progress bar with estimated time.
select

Select/rename variables by name
setops

Set operations
summarise_all

Summarise and mutate multiple columns.
slice

Select rows by position
sql

SQL escaping.
tbl_vars

List variables provided by a tbl.
tbl

Create a table from a data source
group_by_all

Group by a selection of variables
group_by_prepare

Prepare for grouping.
mutate

Add new variables
n_distinct

Efficiently count the number of unique values in a set of vector
dplyr-package

dplyr: a grammar of data manipulation
explain

Explain details of a tbl
ident

Flag a character vector as SQL identifiers
ranking

Windowed rank functions.
recode

Recode values
select_all

Select and rename a selection of variables
pull

Pull out a single variable
same_src

Figure out if two sources are the same (or two tbl have the same source)
sample

Sample n rows from a table
src_dbi

Source for database backends
select_helpers

Select helpers
starwars

Starwars characters
storms

Storm tracks data
with_order

Run a function with one order, translating result back to original order
if_else

Vectorised if
init_logging

Enable internal logging
join

Join two tbls together
nasa

NASA spatio-temporal data
near

Compare two numeric vectors
select_var

Select variable
select_vars

Select variables.
src_tbls

List all tbls provided by a source.
src

Create a "src" object
summarise_each

Summarise and mutate multiple columns.
tbl_cube

A data cube tbl
tbl_df

Create a data frame tbl.
summarise

Reduces multiple values down to a single value
tally

Count/tally observations by group
src_local

A local source.
top_n

Select top (or bottom) n rows (by value)
vars

Select variables