Learn R Programming

PPRL (version 0.3.5)

CreateBF: Bloom Filter Encoding

Description

Creates Bloom filters for each row of the input data by splitting the input into q-grams which are hashed into a bit vector.

Usage

CreateBF(ID, data, password, k = 20, padding = 1, qgram = 2, lenBloom = 1000)

Arguments

ID

a character vector or integer vector containing the IDs of the data.frame.

data

a character vector containing the data to be encoded. Make sure the input vectors are not factors.

password

a string used as a password for the random hashing of the q-grams.

k

number of bit positions set to one for each bigram.

padding

integer (0 or 1) indicating if string padding is to be used.

qgram

integer (1 or 2) indicating whether to split the input strings into bigrams (q = 2) or unigrams (q = 1).

lenBloom

desired length of the Bloom filter in bits.

Value

A data.frame containing IDs and the corresponding bit vector.

See Also

PPRL, BloomFilterLinkage, CreateCLK, SelectSimilarityFunctionBF, StandardizeString

Examples

Run this code
# NOT RUN {
# Load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
testData <- read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")

# Encode data
BF <- CreateBF(ID = testData$V1, data = testData$V7,
  k = 20, padding = 1, q = 2, l = 1000,
  password = "(H]$6Uh*-Z204q")

# }

Run the code above in your browser using DataLab