gower_topn: Find the top-n matches

Description

Find the top-n matches in y for each record in x.

Usage

gower_topn(
  x,
  y,
  pair_x = NULL,
  pair_y = NULL,
  n = 5,
  eps = 1e-08,
  weights = NULL,
  ignore_case = FALSE,
  nthread = getOption("gd_num_thread")
)

Value

A list with two array elements: index

and distance. Both have size n X nrow(x). Each ith column corresponds to the top-n best matches of x with rows in y. When there are no columns to compare, a message is printed and both

distance and index will be empty matrices; the list is then returned invisibly.

Arguments

x: [data.frame]
y: [data.frame]
pair_x: [numeric|character] (optional) Columns in x used for comparison. See Details below.
pair_y: [numeric|character] (optional) Columns in y used for comparison. See Details below.
n: The top-n indices and distances to return.
eps: [numeric] (optional) Computed numbers (variable ranges) smaller than eps are treated as zero.
weights: [numeric] (optional) A vector of weights of length ncol(x) that defines the weight applied to each component of the gower distance.
ignore_case: [logical] Toggle ignore case when neither pair_x nor pair_y are user-defined.
nthread: Number of threads to use for parallelization. By default, for a dual-core machine, 2 threads are used. For any other machine n-1 cores are used so your machine doesn't freeze during a big computation. The maximum nr of threads are determined using omp_get_max_threads at C level.

Examples

Run this code

# find the top 4 best matches in the iris data set with itself.
x <- iris[1:3,]
lookup <- iris[1:10,]
gower_topn(x=x,y=lookup,n=4)

Run the code above in your browser using DataLab