fuzzyjoin (version 0.1.5)

distance_join: Join two tables based on a distance metric of one or more columns

Description

This differs from difference_join in that it considers all of the columns together when computing distance. This allows it to use metrics such as Euclidean or Manhattan that depend on multiple columns. Note that if you are computing with longitude or latitude, you probably want to use geo_join.

Usage

distance_join(x, y, by = NULL, max_dist = 1, method = c("euclidean",
  "manhattan"), mode = "inner", distance_col = NULL)

distance_inner_join(x, y, by = NULL, method = "euclidean", max_dist = 1, distance_col = NULL)

distance_left_join(x, y, by = NULL, method = "euclidean", max_dist = 1, distance_col = NULL)

distance_right_join(x, y, by = NULL, method = "euclidean", max_dist = 1, distance_col = NULL)

distance_full_join(x, y, by = NULL, method = "euclidean", max_dist = 1, distance_col = NULL)

distance_semi_join(x, y, by = NULL, method = "euclidean", max_dist = 1, distance_col = NULL)

distance_anti_join(x, y, by = NULL, method = "euclidean", max_dist = 1, distance_col = NULL)

Arguments

x

A tbl

y

A tbl

by

Columns by which to join the two tables

max_dist

Maximum distance to use for joining

method

Method to use for computing distance, either euclidean (default) or manhattan.

mode

One of "inner", "left", "right", "full" "semi", or "anti"

distance_col

If given, will add a column with this name containing the distance between the two

Examples

Run this code
# NOT RUN {
library(dplyr)

head(iris)
sepal_lengths <- data_frame(Sepal.Length = c(5, 6, 7),
                            Sepal.Width = 1:3)

iris %>%
  distance_inner_join(sepal_lengths, max_dist = 2)

# }

Run the code above in your browser using DataCamp Workspace