Learn R Programming

klsh

An implementation of the blocking algorithm KLSH (Kmeans locality sensitive hashing) in Steorts, Ventura, Sadinle, Fienberg (2014) https://link.springer.com/chapter/10.1007/978-3-319-11257-2_20, which is a k-means variant of locality sensitive hashing. The method is illustrated with examples and a vignette.

Install the development version from GitHub

devtools::install_github("resteorts/blockr")

Copy Link

Version

Install

install.packages('klsh')

Monthly Downloads

215

Version

0.1.0

License

GPL-3

Maintainer

Rebecca Steorts

Last Published

October 22nd, 2020

Functions in klsh (0.1.0)

bag_of_word_ify

Function to convert a record into a bag of tokens with a fieldwise flag
confusion.from.blocking

Perform evaluations (recall) for blocking.
sacks_of_bags_of_words

Function to convert all records into a bag of tokens
bag_signatures

Function that reduces a bag of words into a signature matrix using multiple random projections
klsh

Function that reduces a bag of words into a signature matrix using multiple random projections
block.ids.from.blocking

Returns the block ids associated with a blocking method.
reduction.ratio.from.blocking

Returns the reduction ratio associated with a blocking method
calc_idf

Function to calculate the inverse document frequency given a shingled bag of words
reduction.ratio

Returns the reduction ratio associated with a blocking method
tokenify

Function to token a string into its k components
rproject_bags

Function that generates unit random vectors and takes (weighted) projections onto the random unit vectors given a bag of words