Learn R Programming

PPRL (version 0.3.8)

CreateBF: Bloom Filter Encoding

Description

Creates Bloom filters for each row of the input data by splitting the input into q-grams which are hashed into a bit vector.

Usage

CreateBF(ID, data, password, k = 20, padding = 1, qgram = 2, lenBloom = 1000)

Value

A data.frame containing IDs and the corresponding bit vector.

Arguments

ID

a character vector or integer vector containing the IDs of the data.frame.

data

a character vector containing the data to be encoded. Make sure the input vectors are not factors.

password

a string used as a password for the random hashing of the q-grams.

k

number of bit positions set to one for each bigram.

padding

integer (0 or 1) indicating if string padding is to be used.

qgram

integer (1 or 2) indicating whether to split the input strings into bigrams (q = 2) or unigrams (q = 1).

lenBloom

desired length of the Bloom filter in bits.

See Also

CreateCLK, StandardizeString

Examples

Run this code
# Load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
testData <- read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")

# Encode data
BF <- CreateBF(ID = testData$V1, data = testData$V7,
  k = 20, padding = 1, q = 2, l = 1000,
  password = "(H]$6Uh*-Z204q")

Run the code above in your browser using DataLab