
Last chance! 50% off unlimited learning
Sale ends in
data.table
s based on common key columns (by default). This merge
method for data.table
is meant to act very similarly to the
merge
method for data.frame
, with the major exception being that
the default columns used to merge two data.table
inputs are the shared key columns
rather than the shared columns with the same names.
For a more data.table
-centric (and faster) way of merging two data.table
s,
see [.data.table
; e.g., x[y, ...]
. In recent
versions, however, merge()
is much closer to the speed of x[y, ...]
.
See FAQ 1.12 for a detailed comparison of merge
and x[y, ...]
.
Note that merge
is a generic function in base R. It dispatches to either the
merge.data.frame
method or merge.data.table
method depending on the class of its first argument.
Typing ?merge
at the prompt should present a choice of two links:
the help pages for each of these merge
methods. You don't need to use the full name of the
method although you may if you wish; i.e., merge(DT1,DT2)
is idiomatic R but you can bypass
method dispatch by going direct if you wish: merge.data.table(DT1,DT2)
.
## S3 method for class 'data.table':
merge(x, y, by = NULL, all = FALSE, all.x = all, all.y = all, suffixes = c(".x", ".y"),
allow.cartesian=getOption("datatable.allow.cartesian"), # default FALSE
...)
data table
s. y
is coerced to a data.table
if
it isn't one already.x
and y
to merge on.
This defaults to the shared key columns between the two tables.
If y
has no key columns, this defaults to the key of x
.all = TRUE
is shorthand to save setting both all.x = TRUE
and
all.y = TRUE
.TRUE
, then extra rows will be added to the
output, one for each row in x
that has no matching row in
y
. These rows will have 'NA's in those columns that are
usually filled with values from all.x
above.character(2)
specifying the suffixes to be used for making
non-by
column names unique. The suffix behavior works in a similar
fashion as the merge.data.frame
meallow.cartesian
in [.data.table
.data.table
based on the merged data table
s, sorted by the
columns set (or inferred for) the by
argument.by
is not the key (or
head of the key) of x
or y
, then a copy is first rekeyed
prior to performing the merge. This might make this function perform
slower than you are expecting. When secondary keys are implemented in
future we expect performance in this case to improve.data.table
, [.data.table
,
merge.data.frame
(dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A"))
(dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A"))
merge(dt1, dt2)
merge(dt1, dt2, all = TRUE)
(dt1 <- data.table(A = letters[rep(1:3, 2)], X = 1:6, key = "A"))
(dt2 <- data.table(A = letters[rep(2:4, 2)], Y = 6:1, key = "A"))
merge(dt1, dt2, allow.cartesian=TRUE)
(dt1 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(1:3, 2)], X = 1:6, key = "A,B"))
(dt2 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(2:4, 2)], Y = 6:1, key = "A,B"))
merge(dt1, dt2)
merge(dt1, dt2, by="B", allow.cartesian=TRUE)
# test it more:
d1 <- data.table(a=rep(1:2,each=3), b=1:6, key="a,b")
d2 <- data.table(a=0:1, bb=10:11, key="a")
d3 <- data.table(a=0:1, key="a")
d4 <- data.table(a=0:1, b=0:1, key="a,b")
merge(d1, d2)
merge(d2, d1)
merge(d1, d2, all=TRUE)
merge(d2, d1, all=TRUE)
merge(d3, d1)
merge(d1, d3)
merge(d1, d3, all=TRUE)
merge(d3, d1, all=TRUE)
merge(d1, d4)
merge(d1, d4, by="a", suffixes=c(".d1", ".d4"))
merge(d4, d1)
merge(d1, d4, all=TRUE)
merge(d4, d1, all=TRUE)
Run the code above in your browser using DataLab