jaccard_probability: Find Probability of Match Based on Similarity
Description
This is a port of the
lsh_probability
function from the
textreuse
package, with arguments changed to reflect the hyperparameters in this
package. It gives the probability that two strings of jaccard similarity
similarity will be matched, given the chosen bandwidth and number of
bands.
# Find the probability two pairs will be matched given they have a# jaccard_similarity of .8, band width of 5, and 50 bands:jaccard_probability(.8, n_bands = 50, band_width = 5)