dplyr v0.7.5

0

Monthly downloads

0th

Percentile

A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Readme

dplyr

Build
Status AppVeyor Build
Status CRAN\_Status\_Badge Coverage
Status

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

dplyr is designed to abstract over how the data is stored. That means as well as working with local data frames, you can also work with remote database tables, using exactly the same R code. Install the dbplyr package then read vignette("databases", package = "dbplyr").

If you are new to dplyr, the best place to start is the data import chapter in R for data science.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dplyr:
install.packages("dplyr")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use the manipulatr mailing list.

Usage

library(dplyr)

starwars %>% 
  filter(species == "Droid")
#> # A tibble: 5 x 13
#>   name  height  mass hair_color skin_color  eye_color birth_year gender
#>   <chr>  <int> <dbl> <chr>      <chr>       <chr>          <dbl> <chr> 
#> 1 C-3PO    167   75. <NA>       gold        yellow          112. <NA>  
#> 2 R2-D2     96   32. <NA>       white, blue red              33. <NA>  
#> 3 R5-D4     97   32. <NA>       white, red  red              NA  <NA>  
#> 4 IG-88    200  140. none       metal       red              15. none  
#> 5 BB8       NA   NA  none       none        black            NA  none  
#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

starwars %>% 
  select(name, ends_with("color"))
#> # A tibble: 87 x 4
#>   name           hair_color skin_color  eye_color
#>   <chr>          <chr>      <chr>       <chr>    
#> 1 Luke Skywalker blond      fair        blue     
#> 2 C-3PO          <NA>       gold        yellow   
#> 3 R2-D2          <NA>       white, blue red      
#> 4 Darth Vader    none       white       yellow   
#> 5 Leia Organa    brown      light       brown    
#> # ... with 82 more rows

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
#> # A tibble: 87 x 4
#>   name           height  mass   bmi
#>   <chr>           <int> <dbl> <dbl>
#> 1 Luke Skywalker    172   77.  26.0
#> 2 C-3PO             167   75.  26.9
#> 3 R2-D2              96   32.  34.7
#> 4 Darth Vader       202  136.  33.3
#> 5 Leia Organa       150   49.  21.8
#> # ... with 82 more rows

starwars %>% 
  arrange(desc(mass))
#> # A tibble: 87 x 13
#>   name    height  mass hair_color skin_color  eye_color  birth_year gender
#>   <chr>    <int> <dbl> <chr>      <chr>       <chr>           <dbl> <chr> 
#> 1 Jabba …    175 1358. <NA>       green-tan,… orange          600.  herma…
#> 2 Grievo…    216  159. none       brown, whi… green, ye…       NA   male  
#> 3 IG-88      200  140. none       metal       red              15.0 none  
#> 4 Darth …    202  136. none       white       yellow           41.9 male  
#> 5 Tarfful    234  136. brown      brown       blue             NA   male  
#> # ... with 82 more rows, and 5 more variables: homeworld <chr>,
#> #   species <chr>, films <list>, vehicles <list>, starships <list>

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(n > 1)
#> # A tibble: 9 x 3
#>   species      n  mass
#>   <chr>    <int> <dbl>
#> 1 Droid        5  69.8
#> 2 Gungan       3  74.0
#> 3 Human       35  82.8
#> 4 Kaminoan     2  88.0
#> 5 Mirialan     2  53.1
#> # ... with 4 more rows

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Functions in dplyr

Name Description
desc Descending order
dplyr-package dplyr: a grammar of data manipulation
between Do values in a numeric vector fall in specified range?
group_by_all Group by a selection of variables
explain Explain details of a tbl
join.tbl_df Join data frame tbls
grouped_df A grouped data frame.
na_if Convert values to NA
group_by_prepare Prepare for grouping.
ident Flag a character vector as SQL identifiers
failwith Fail with specified value.
init_logging Enable internal logging
id Compute a unique numeric id for each unique row in a data frame.
dim_desc Describing dimensions
join Join two tbls together
setops Set operations
nasa NASA spatio-temporal data
if_else Vectorised if
slice Select rows by position
src_tbls List all tbls provided by a source.
select_vars Select variables
select_all Select and rename a selection of variables
groups Return grouping variables
n The number of observations in the current group.
do Do anything
distinct Select distinct/unique rows
sql SQL escaping.
group_indices Group id.
src Create a "src" object
n_distinct Efficiently count the number of unique values in a set of vector
order_by A helper function for ordering window function output
dr_dplyr Dr Dplyr checks your installation for common problems.
group_size Calculate group sizes.
recode Recode values
top_n Select top (or bottom) n rows (by value)
progress_estimated Progress bar with estimated time.
rowwise Group input by rows
starwars Starwars characters
lead-lag Lead and lag.
vars Select variables
near Compare two numeric vectors
reexports Objects exported from other packages
filter Return rows with matching conditions
location Print the location in memory of a data frame
filter_all Filter within a selection of variables
nth Extract the first, last or nth value from a vector
tally_ Deprecated SE versions of main verbs.
sample Sample n rows from a table
funs Create a list of functions calls.
mutate Add new variables
make_tbl Create a "tbl" object
group_by Group by one or more variables
scoped Operate on a selection of variables
summarise_all Summarise and mutate multiple columns.
tidyeval Tidy eval helpers
tbl_vars List variables provided by a tbl.
same_src Figure out if two sources are the same (or two tbl have the same source)
select Select/rename variables by name
tbl_cube A data cube tbl
summarise_each Summarise and mutate multiple columns.
summarise Reduces multiple values down to a single value
storms Storm tracks data
pull Pull out a single variable
ranking Windowed rank functions.
tbl_df Create a data frame tbl.
with_order Run a function with one order, translating result back to original order
src_dbi Source for database backends
tally Count/tally observations by group
tbl Create a table from a data source
src_local A local source.
as.tbl_cube Coerce an existing data structure into a tbl_cube
bind Efficiently bind multiple data frames by row and column
all_vars Apply predicate to all variables
auto_copy Copy tables to same source, if necessary
case_when A general vectorised if
arrange Arrange rows by variables
bench_compare Evaluate, compare, benchmark operations of a set of srcs.
backend_dbplyr Database and SQL generics.
arrange_all Arrange rows by a selection of variables
band_members Band membership
add_rownames Convert row names to an explicit variable.
as.table.tbl_cube Coerce a tbl_cube to other data structures
all_equal Flexible equality comparison for data frames
check_dbplyr dbplyr compatibility functions
common_by Extract out common by variables
coalesce Find first non-missing element
copy_to Copy a local data frame to a remote src
cumall Cumulativate versions of any, all, and mean
compute Force computation of a database query
No Results!

Vignettes of dplyr

Name
internals/hybrid-evaluation.Rmd
compatibility.Rmd
data_frames.html
databases.html
dplyr.Rmd
hybrid-evaluation.html
introduction.html
new-sql-backend.html
nse.html
programming.Rmd
two-table.Rmd
window-functions.Rmd
No Results!

Last month downloads

Details

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/dplyr)](http://www.rdocumentation.org/packages/dplyr)