As base::merge()
does for data.frame
s, this function takes two datasets,
matches rows based on a specified key variable, and adds columns from one to
the other.
joinDatasets(x, y, by = intersect(names(x), names(y)), by.x = by,
by.y = by, all = FALSE, all.x = TRUE, all.y = FALSE,
copy = TRUE)extendDataset(x, y, by = intersect(names(x), names(y)), by.x = by,
by.y = by, all = FALSE, all.x = TRUE, all.y = FALSE, ...)
# S3 method for CrunchDataset
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = TRUE, all.y = FALSE,
...)
CrunchDataset to add data to
CrunchDataset to copy data from. May be filtered by rows and/or columns.
character, optional shortcut for specifying by.x
and
by.y
by alias if the key variables have the same alias in both
datasets.
CrunchVariable in x
on which to join, or the alias
(following crunch.namekey.dataset
of a variable. Must be type
numeric or text and have all unique, non-missing values.
CrunchVariable in y
on which to join, or the alias
(following crunch.namekey.dataset
of a variable. Must be type
numeric or text and have all unique, non-missing values.
logical: should all rows in x and y be kept, i.e. a "full outer"
join? Only FALSE
is currently supported.
logical: should all rows in x be kept, i.e. a "left outer"
join? Only TRUE
is currently supported.
logical: should all rows in y be kept, i.e. a "right outer"
join? Only FALSE
is currently supported.
logical: make a virtual or materialized join. Default is
TRUE
, which means materialized. Virtual joins are in fact not currently
implemented, so the default is the only valid value.
additional arguments, ignored
x
extended by the columns of y
, matched on the "by" variables.
Since joining two datasets can sometimes produce unexpected results if the
keys differ between the two datasets, you may want to follow the
fork-edit-merge workflow for this operation. To do this, fork the dataset
with forkDataset()
, join the new data to the fork, ensure that
the resulting dataset is correct, and merge it back to the original dataset
with mergeFork()
. For more, see
vignette("fork-and-merge", package = "crunch")
.