
Last chance! 50% off unlimited learning
Sale ends in
Function that creates a similarity graph and divides it into communities (or blocks) for entity resolution
compare_buckets(hashed_signatures, max_bucket_size = 1000)
The hashed signatures
The largest block size allowed by user
max_bucket_size The largest bucket size (or block size) that one can handle
# NOT RUN {
head(data <- RLdata500[-c(2,4)])
minidata <- data[1:2,]
head(all_the_shingles <- apply(minidata,1,shingles,k=8))
head(minhash.minidata <- minhash_v2(all_the_shingles, p=10))
hashed_signature <- hash_signature(minhash.minidata, b=5)
compare_buckets(hashed_signature, max_bucket_size=200)
# }
Run the code above in your browser using DataLab