Unlimited learning, half price | 50% off

Last chance! 50% off unlimited learning

Sale ends in


tlsh (version 0.1.0)

compare_buckets: Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution

Description

Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution

Usage

compare_buckets(hashed_signatures, max_bucket_size = 1000)

Arguments

hashed_signatures

The hashed signatures

max_bucket_size

The largest block size allowed by user

Value

max_bucket_size The largest bucket size (or block size) that one can handle

Examples

Run this code
# NOT RUN {
head(data <- RLdata500[-c(2,4)])
minidata <- data[1:2,]
head(all_the_shingles <- apply(minidata,1,shingles,k=8))
head(minhash.minidata <- minhash_v2(all_the_shingles, p=10))
hashed_signature <- hash_signature(minhash.minidata, b=5)
compare_buckets(hashed_signature, max_bucket_size=200)
# }

Run the code above in your browser using DataLab