merge: Merge Two Data Tables

Description

Relatively quick merge of data tables based on common keys (by default).

This function is meant to act very similarly to the merge.data.frame function, with the major exception being that the default columns used to merge two data.tables are the shared key columns, and not the shared columns with the same names. For a more data.table-centric way of merging two data.tables, take a look at [.data.table, eg. x[y, ...].

Usage

## S3 method for class 'data.table':
merge(x, y, by = NULL, all = FALSE, all.x = all, all.y = all, suffixes = c(".x", ".y"), ...)

Arguments

x, y

data tables. y is coerced to a data.table if it isn't one already

A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the keys set for x. Note that

all

logical; all = L is shorthand for all.x = L and all.y = L.

all.x

logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have 'NA's in those columns that are usually filled with values from

all.y

logical; analogous to all.x above.

suffixes

A character(2) specifying the suffixes to be used for making non-by column names unique. The suffix behavior works in a similar fashion as merge.data.frame does.

...

Not used at this time.

Value

A new data.table based on the merged data tables, sorted by the columns set (or inferred for) the by argument.

Details

Keys for each data.table are reshuffled to ensure that the columns identified in the by parameter are prefixes of the keys set for data.tables x and y -- this may cause the function to run slower than expected.

Examples

Run this code

(dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A"))
    (dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A"))
    merge(dt1, dt2)
    merge(dt1, dt2, all = TRUE)

    (dt1 <- data.table(A = letters[rep(1:3, 2)], X = 1:6, key = "A"))
    (dt2 <- data.table(A = letters[rep(2:4, 2)], Y = 6:1, key = "A"))
    merge(dt1, dt2)

    (dt1 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(1:3, 2)], X = 1:6, key = "A,B"))
    (dt2 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(2:4, 2)], Y = 6:1, key = "A,B"))
    merge(dt1, dt2)
    merge(dt1, dt2, by="B")

    # test it more:
    d1 <- data.table(a=rep(1:2,each=3), b=1:6, key="a,b")
    d2 <- J(a=0:1, bb=10:11, key="a")
    d3 <- J(a=0:1, key="a")
    d4 <- J(a=0:1, b=0:1, key="a,b")

    merge(d1, d2)
    merge(d2, d1)
    merge(d1, d2, all=TRUE)
    merge(d2, d1, all=TRUE)

    merge(d3, d1)
    merge(d1, d3)
    merge(d1, d3, all=TRUE)
    merge(d3, d1, all=TRUE)

    merge(d1, d4)
    merge(d1, d4, by="a", suffixes=c(".d1", ".d4"))
    merge(d4, d1)
    merge(d1, d4, all=TRUE)
    merge(d4, d1, all=TRUE)

Run the code above in your browser using DataLab