dplyr v0.7.4

0

Monthly downloads

0th

Percentile

A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Readme

dplyr

Build Status AppVeyor Build Status CRAN\_Status\_Badge Coverage Status

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation "by group". You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

dplyr is designed to abstract over how the data is stored. That means as well as working with local data frames, you can also work with remote database tables, using exactly the same R code. Install the dbplyr package then read vignette("databases", package = "dbplyr").

If you are new to dplyr, the best place to start is the data import chapter in R for data science.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dplyr:
install.packages("dplyr")

# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use the manipulatr mailing list.

Usage

library(dplyr)

starwars %>% 
  filter(species == "Droid")
#> # A tibble: 5 x 13
#>   name  height  mass hair… skin… eye_… birt… gend… home… spec… films vehi…
#>   <chr>  <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <lis> <lis>
#> 1 C-3PO    167  75.0 <NA>  gold  yell… 112   <NA>  Tato… Droid <chr… <chr…
#> 2 R2-D2     96  32.0 <NA>  "whi… red    33.0 <NA>  Naboo Droid <chr… <chr…
#> 3 R5-D4     97  32.0 <NA>  "whi… red    NA   <NA>  Tato… Droid <chr… <chr…
#> 4 IG-88    200 140   none  metal red    15.0 none  <NA>  Droid <chr… <chr…
#> 5 BB8       NA  NA   none  none  black  NA   none  <NA>  Droid <chr… <chr…
#> # ... with 1 more variable: starships <list>

starwars %>% 
  select(name, ends_with("color"))
#> # A tibble: 87 x 4
#>   name             hair_color skin_color    eye_color
#>   <chr>            <chr>      <chr>         <chr>    
#> 1 "Luke Skywalker" blond      fair          blue     
#> 2 C-3PO            <NA>       gold          yellow   
#> 3 R2-D2            <NA>       "white, blue" red      
#> 4 "Darth Vader"    none       white         yellow   
#> 5 "Leia Organa"    brown      light         brown    
#> # ... with 82 more rows

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
#> # A tibble: 87 x 4
#>   name             height  mass   bmi
#>   <chr>             <int> <dbl> <dbl>
#> 1 "Luke Skywalker"    172  77.0  26.0
#> 2 C-3PO               167  75.0  26.9
#> 3 R2-D2                96  32.0  34.7
#> 4 "Darth Vader"       202 136    33.3
#> 5 "Leia Organa"       150  49.0  21.8
#> # ... with 82 more rows

starwars %>% 
  arrange(desc(mass))
#> # A tibble: 87 x 13
#>   name   heig…  mass hair… skin… eye_… birt… gend… home… spec… films vehi…
#>   <chr>  <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <lis> <lis>
#> 1 "Jabb…   175  1358 <NA>  "gre… oran… 600   herm… "Nal… Hutt  <chr… <chr…
#> 2 Griev…   216   159 none  "bro… "gre…  NA   male  Kalee Kale… <chr… <chr…
#> 3 IG-88    200   140 none  metal red    15.0 none  <NA>  Droid <chr… <chr…
#> 4 "Dart…   202   136 none  white yell…  41.9 male  Tato… Human <chr… <chr…
#> 5 Tarff…   234   136 brown brown blue   NA   male  Kash… Wook… <chr… <chr…
#> # ... with 82 more rows, and 1 more variable: starships <list>

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(n > 1)
#> # A tibble: 9 x 3
#>   species      n  mass
#>   <chr>    <int> <dbl>
#> 1 Droid        5  69.8
#> 2 Gungan       3  74.0
#> 3 Human       35  82.8
#> 4 Kaminoan     2  88.0
#> 5 Mirialan     2  53.1
#> # ... with 4 more rows

Functions in dplyr

Name Description
add_rownames Convert row names to an explicit variable.
check_dbplyr dbplyr compatibility functions
auto_copy Copy tables to same source, if necessary
compute Force computation of a database query
desc Descending order
filter Return rows with matching conditions
dplyr-package dplyr: a grammar of data manipulation
all_vars Apply predicate to all variables
dr_dplyr Dr Dplyr checks your installation for common problems.
coalesce Find first non-missing element
arrange Arrange rows by variables
filter_all Filter within a selection of variables
bind Efficiently bind multiple data frames by row and column
arrange_all Arrange rows by a selection of variables
group_indices Group id.
dim_desc Describing dimensions
group_size Calculate group sizes.
case_when A general vectorised if
funs Create a list of functions calls.
distinct Select distinct/unique rows
do Do anything
as.table.tbl_cube Coerce a tbl_cube to other data structures
group_by Group by one or more variables
group_by_all Group by a selection of variables
make_tbl Create a "tbl" object
mutate Add new variables
between Do values in a numeric vector fall in specified range?
cumall Cumulativate versions of any, all, and mean
group_by_prepare Prepare for grouping.
copy_to Copy a local data frame to a remote src
bench_compare Evaluate, compare, benchmark operations of a set of srcs.
n The number of observations in the current group.
tally_ Deprecated SE versions of main verbs.
select Select/rename variables by name
recode Recode values
lead-lag Lead and lag.
location Print the location in memory of a data frame
id Compute a unique numeric id for each unique row in a data frame.
tally Count/tally observations by group
na_if Convert values to NA
explain Explain details of a tbl
nasa NASA spatio-temporal data
tbl Create a table from a data source
failwith Fail with specified value.
pull Pull out a single variable
ident Flag a character vector as SQL identifiers
ranking Windowed rank functions.
select_all Select and rename a selection of variables
n_distinct Efficiently count the number of unique values in a set of vector
select_var Select variable
grouped_df A grouped data frame.
join Join two tbls together
reexports Objects exported from other packages
select_helpers Select helpers
rowwise Group input by rows
join.tbl_df Join data frame tbls
order_by A helper function for ordering window function output
setops Set operations
groups Return grouping variables
select_vars Select variables.
progress_estimated Progress bar with estimated time.
same_src Figure out if two sources are the same (or two tbl have the same source)
sample Sample n rows from a table
if_else Vectorised if
slice Select rows by position
near Compare two numeric vectors
vars Select variables
init_logging Enable internal logging
src_dbi Source for database backends
src_local A local source.
nth Extract the first, last or nth value from a vector
tbl_cube A data cube tbl
storms Storm tracks data
scoped Operate on a selection of variables
tbl_df Create a data frame tbl.
sql SQL escaping.
src_tbls List all tbls provided by a source.
src Create a "src" object
with_order Run a function with one order, translating result back to original order
starwars Starwars characters
summarise Reduces multiple values down to a single value
tbl_vars List variables provided by a tbl.
summarise_all Summarise and mutate multiple columns.
summarise_each Summarise and mutate multiple columns.
top_n Select top (or bottom) n rows (by value)
as.tbl_cube Coerce an existing data structure into a tbl_cube
all_equal Flexible equality comparison for data frames
backend_dbplyr Database and SQL generics.
band_members Band membership
common_by Extract out common by variables
No Results!

Vignettes of dplyr

Name
internals/hybrid-evaluation.Rmd
compatibility.Rmd
data_frames.html
databases.html
dplyr.Rmd
hybrid-evaluation.html
introduction.html
new-sql-backend.html
nse.html
programming.Rmd
two-table.Rmd
window-functions.Rmd
No Results!

Last month downloads

Details

Type Package
URL http://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr
BugReports https://github.com/tidyverse/dplyr/issues
Encoding UTF-8
VignetteBuilder knitr
LinkingTo Rcpp (>= 0.12.0), BH (>= 1.58.0-1), bindrcpp, plogr
LazyData yes
License MIT + file LICENSE
RoxygenNote 6.0.1
NeedsCompilation yes
Packaged 2017-09-16 15:25:52 UTC; muelleki
Repository CRAN
Date/Publication 2017-09-28 20:43:29 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/dplyr)](http://www.rdocumentation.org/packages/dplyr)