dtplyr (version 0.0.1)

join.tbl_dt: Join data table tbls.

Description

See join for a description of the general purpose of the functions.

Usage

"inner_join"(x, y, by = NULL, copy = FALSE, ...)
"left_join"(x, y, by = NULL, copy = FALSE, ...)
"right_join"(x, y, by = NULL, copy = FALSE, ...)
"semi_join"(x, y, by = NULL, copy = FALSE, ...)
"anti_join"(x, y, by = NULL, copy = FALSE, ...)
"full_join"(x, y, by = NULL, copy = FALSE, ...)

Arguments

x, y
tbls to join
by
a character vector of variables to join by. If NULL, the default, join will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join).

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

copy
If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.
...
Included for compatibility with generic; otherwise ignored.

Examples

Run this code
library(dplyr, warn.conflicts = FALSE)

if (require("Lahman")) {
batting_dt <- tbl_dt(Batting)
person_dt <- tbl_dt(Master)

# Inner join: match batting and person data
inner_join(batting_dt, person_dt)

# Left join: keep batting data even if person missing
left_join(batting_dt, person_dt)

# Semi-join: find batting data for top 4 teams, 2010:2012
grid <- expand.grid(
  teamID = c("WAS", "ATL", "PHI", "NYA"),
  yearID = 2010:2012)
top4 <- semi_join(batting_dt, grid, copy = TRUE)

# Anti-join: find batting data with out player data
anti_join(batting_dt, person_dt)
}

Run the code above in your browser using DataCamp Workspace