merge(x, y, ...)"merge"(x, y, ...)"merge"(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)
all = Lis shorthand for
all.x = Land
all.y = L, where
TRUE, then extra rows will be added to the output, one for each row in
xthat has no matching row in
y. These rows will have
NAs in those columns that are usually filled with values from
y. The default is
FALSE, so that only rows with data from both
yare included in the output.
match. This is intended to be used for merging on one column, so these are incomparable values of that column.
sort = FALSEare in an unspecified order. The columns are the common columns followed by the remaining columns in
xand then those in
y. If the matching involved row names, an extra character column called
Row.namesis added at the left, and in all cases the result has automatic row names.
mergeis a generic function whose principal method is for data frames: the default method coerces its arguments to data frames and calls the
By default the data frames are merged on the columns with names they
both have, but separate specifications of the columns can be given by
by.y. The rows in the two data frames that
match on the specified columns are extracted, and joined together. If
there is more than one match, all possible matches contribute one row
each. For the precise meaning of match, see
Columns to merge on can be specified by name, number or by a logical
vector: the name
"row.names" or the number
the row names. If specified by name it must correspond uniquely to a
named column in the input.
by or both
by.y are of length 0 (a
length zero vector or
NULL), the result,
r, is the
Cartesian product of
dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).
all.x is true, all the non matching cases of
appended to the result as well, with
NA filled in the
corresponding columns of
y; analogously for
If the columns in the data frames not used in merging have any common
names, these have
default) appended to try to make the names of the result unique. If
this is not possible, an error is thrown.
The complexity of the algorithm used is proportional to the length of the answer.
In SQL database terminology, the default value of
all = FALSE
gives a natural join, a special case of an inner
all.x = TRUE gives a left (outer)
all.y = TRUE a right (outer) join, and both
all = TRUE a (full) outer join. DBMSes do not match
NULL records, equivalent to
incomparables = NA in R.
dendrogram for a class which has a
## use character columns of names to get sensible sort order authors <- data.frame( surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4))) books <- data.frame( name = I(c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith")) (m1 <- merge(authors, books, by.x = "surname", by.y = "name")) (m2 <- merge(books, authors, by.x = "name", by.y = "surname")) stopifnot(as.character(m1[, 1]) == as.character(m2[, 1]), all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]), dim(merge(m1, m2, by = integer(0))) == c(36, 10)) ## "R core" is missing from authors and appears only here : merge(authors, books, by.x = "surname", by.y = "name", all = TRUE) ## example of using 'incomparables' x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5) y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5) merge(x, y, by = c("k1","k2")) # NA's match merge(x, y, by = "k1") # NA's match, so 6 rows merge(x, y, by = "k2", incomparables = NA) # 2 rows