joinDatasets: Add columns from one dataset to another, joining on a key

Description

As base::merge() does for data.frames, this function takes two datasets, matches rows based on a specified key variable, and adds columns from one to the other.

Usage

joinDatasets(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = TRUE,
  all.y = FALSE,
  copy = TRUE
)
extendDataset(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = TRUE,
  all.y = FALSE,
  ...
)
# S3 method for CrunchDataset
merge(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = TRUE,
  all.y = FALSE,
  ...
)

Arguments

CrunchDataset to add data to

CrunchDataset to copy data from. May be filtered by rows and/or columns.

character, optional shortcut for specifying by.x and by.y by alias if the key variables have the same alias in both datasets.

by.x

CrunchVariable in x on which to join, or the alias (following crunch.namekey.dataset of a variable. Must be type numeric or text and have all unique, non-missing values.

by.y

CrunchVariable in y on which to join, or the alias (following crunch.namekey.dataset of a variable. Must be type numeric or text and have all unique, non-missing values.

all

logical: should all rows in x and y be kept, i.e. a "full outer" join? Only FALSE is currently supported.

all.x

logical: should all rows in x be kept, i.e. a "left outer" join? Only TRUE is currently supported.

all.y

logical: should all rows in y be kept, i.e. a "right outer" join? Only FALSE is currently supported.

copy

logical: make a virtual or materialized join. Default is TRUE, which means materialized. Virtual joins are in fact not currently implemented, so the default is the only valid value.

...

additional arguments, ignored

Value

x extended by the columns of y, matched on the "by" variables.

Details

Since joining two datasets can sometimes produce unexpected results if the keys differ between the two datasets, you may want to follow the fork-edit-merge workflow for this operation. To do this, fork the dataset with forkDataset(), join the new data to the fork, ensure that the resulting dataset is correct, and merge it back to the original dataset with mergeFork(). For more, see vignette("fork-and-merge", package = "crunch").