data.table (version 1.8.0)

merge: Merge Two Data Tables


Relatively quick merge of data tables based on common keys (by default).

This function is meant to act very similarly to the function, with the major exception being that the default columns used to merge two data.tables are the shared key columns, and not the shared columns with the same names. For a more data.table-centric (and faster) way of merging two data.tables, take a look at [.data.table; e.g., x[y, ...].


## S3 method for class 'data.table':
merge(x, y, by = NULL, all = FALSE, all.x = all, all.y = all, suffixes = c(".x", ".y"), ...)


x, y
data tables. y is coerced to a data.table if it isn't one already
A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the keys set for x. Note that
logical; all = L is shorthand for all.x = L and all.y = L.
logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have 'NA's in those columns that are usually filled with values from
logical; analogous to all.x above.
A character(2) specifying the suffixes to be used for making non-by column names unique. The suffix behavior works in a similar fashion as does.
Not used at this time.


  • A new data.table based on the merged data tables, sorted by the columns set (or inferred for) the by argument.


Keys for each data.table are reshuffled to ensure that the columns identified in the by parameter are prefixes of the keys set for data.tables x and y -- this may cause the function to run slower than expected.

See Also

data.table, [.data.table,


Run this code
(dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A"))
    (dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A"))
    merge(dt1, dt2)
    merge(dt1, dt2, all = TRUE)

    (dt1 <- data.table(A = letters[rep(1:3, 2)], X = 1:6, key = "A"))
    (dt2 <- data.table(A = letters[rep(2:4, 2)], Y = 6:1, key = "A"))
    merge(dt1, dt2)

    (dt1 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(1:3, 2)], X = 1:6, key = "A,B"))
    (dt2 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(2:4, 2)], Y = 6:1, key = "A,B"))
    merge(dt1, dt2)
    merge(dt1, dt2, by="B")

    # test it more:
    d1 <- data.table(a=rep(1:2,each=3), b=1:6, key="a,b")
    d2 <- J(a=0:1, bb=10:11, key="a")
    d3 <- J(a=0:1, key="a")
    d4 <- J(a=0:1, b=0:1, key="a,b")

    merge(d1, d2)
    merge(d2, d1)
    merge(d1, d2, all=TRUE)
    merge(d2, d1, all=TRUE)

    merge(d3, d1)
    merge(d1, d3)
    merge(d1, d3, all=TRUE)
    merge(d3, d1, all=TRUE)

    merge(d1, d4)
    merge(d1, d4, by="a", suffixes=c(".d1", ".d4"))
    merge(d4, d1)
    merge(d1, d4, all=TRUE)
    merge(d4, d1, all=TRUE)

Run the code above in your browser using DataCamp Workspace