Learn R Programming

PPRL (version 0.3.8)

CreateRecordLevelBF: Record Level Bloom Filter (RLBF) Encoding

Description

Creates Record Level Bloom filters, combining single Bloom filters into a singular bit vector.

Usage

CreateRecordLevelBF(ID, data, password, lenRLBF = 1000, k = 20,
                       padding = as.integer(c(0)),
                       qgram = as.integer(c(2)),
                       lenBloom = as.integer(c(500)),
                       method = "StaticUniform",
                       weigths = as.numeric(c(1)))

Value

A data.frame containing IDs and the corresponding bit vector.

Arguments

ID

a character vector or integer vector containing the IDs of the data.frame.

data

a character vector containing the data to be encoded. Make sure the input vectors are not factors.

password

a string used as a password for the random hashing of the q-grams and the shuffling.

lenRLBF

length of the final Bloom filter.

lenBloom

an integer vector containing the length of the first level Bloom filters which are to be combined. For the methods "StaticUniform" and "StaticWeighted", a single integer is required, since all original Bloom filters will have the same length.

k

number of bit positions set to one for each q-gram.

padding

integer (0 or 1) indicating if string padding is to be used.

qgram

integer (1 or 2) indicating whether to split the input strings into bigrams (q = 2) or unigrams (q = 1).

method

any of either "StaticUniform", "StaticWeigthed", "DynamicUniform" or "DynamicWeighted" (see details).

weigths

weigths are used for the "StaticWeighted" and "DynamicWeighted" methods. The weights vector must have the same length as number of columns in the input data. The sum of the weights must be 1.

Details

Single Bloom filters are first built for every variable in the input data.frame. Combining the Bloom filters is done by sampling a set fraction of the original Bloom filters and concatenating the samples. The result is then shuffled. The sampling can be done using four different weighting methods:

  1. StaticUniform

  2. StaticWeighted

  3. DynamicUniform

  4. DynamicWeighted

Details are described in the original publication.

See Also

CreateBF, CreateCLK,

Examples

Run this code
# Load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
if (FALSE) {
testData <- read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")

# StaticUniform
RLBF <- CreateRecordLevelBF(ID = testData$V1,
  data = testData[, c(2, 3, 7, 8)],
  lenRLBF = 1000, k = 20,
  padding = c(0, 0, 1, 1), qgram = c(1, 1, 2, 2),
  lenBloom = 500,
  password = c("(H]$6Uh*-Z204q", "lkjg", "klh", "KJHkälk5"),
  method = "StaticUniform")

# StaticWeigthed
RLBF <- CreateRecordLevelBF(ID = testData$V1,
  data = testData[, c(2, 3, 7, 8)],
  lenRLBF = 1000, k = 20,
  padding = c(0, 0, 1, 1), qgram = c(1, 1, 2, 2),
  lenBloom = 500,
  password = c("(H]$6Uh*-Z204q", "lkjg", "klh", "KJHkälk5"),
  method = "StaticWeigthed", weigths = c(0.1, 0.1, 0.5, 0.3))

# DynamicUniform
RLBF <- CreateRecordLevelBF(ID = testData$V1,
  data = testData[, c(2, 3, 7, 8)],
  lenRLBF = 1000, k = 20,
  padding = c(0, 0, 1, 1), qgram = c(1, 1, 2, 2),
  lenBloom = c(300, 400, 550, 500),
  password = c("(H]$6Uh*-Z204q", "lkjg", "klh", "KJHkälk5"),
  method = "DynamicUniform")

# DynamicWeigthed
RLBF <- CreateRecordLevelBF(ID = testData$V1,
  data = testData[, c(2, 3, 7, 8)],
  lenRLBF = 1000, k = 20,
  padding = c(0, 0, 1, 1), qgram = c(1, 1, 2, 2),
  lenBloom = c(300, 400, 550, 500),
  password = c("(H]$6Uh*-Z204q", "lkjg", "klh", "KJHkälk5"),
  method = "DynamicWeigthed", weigths = c(0.1, 0.1, 0.5, 0.3))
}

Run the code above in your browser using DataLab