Learn R Programming

tlsh (version 0.1.0)

Transitive Locality-Sensitive Hashing (LSH) for Record Linkage

Description

An implementation of the blocking algorithm transitive locality-sensitive hashing (TLSH) in Steorts, Ventura, Sadinle, Fienberg (2014) , which is a k-means variant of locality sensitive hashing. The method is illustrated with examples and a vignette.

Copy Link

Version

Install

install.packages('tlsh')

Monthly Downloads

2

Version

0.1.0

License

GPL-3

Maintainer

Rebecca Steorts

Last Published

November 6th, 2020

Functions in tlsh (0.1.0)

rhash_funcs

Function to generate a vector of random hash functions (or optionally one vector-valued function)
compare_buckets

Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution
reduction.ratio

Returns the reduction ratio associated with a blocking method
confusion.from.blocking

Perform evaluations (recall) for blocking.
reduction.ratio.from.blocking

Returns the reduction ratio associated with a blocking method
shingles

Function to shingle (token or gram) a string into its k components
shingled_record_to_index_vec

Function to convert to tell what index the shingle corresponds to in the record
hash_signature

Function to take a signature matrix M composed of b bands and r rows and return a bucket for each band for each record
minhash_v2

Function to create a matrix of minhashed signatures
block.ids.from.blocking

Returns the block ids associated with a blocking method.
block_setup_v2

Function that divides all records into bins using locality sensitive hashing and using TLSH (based upon community detection technique)
primest

Function to generate all primes larger than an integer n1 (lower limit) and less than any other integer n2 (upper limit)
eval.blocksetup

Function to evaluate the blocking step
my_hash

Function that applies a hash function to each column of the band from the signature matrix import bit64
extract_pairs_from_band

Function that extracts pairs of records from a band in the signature matrix M import bit64