Learn R Programming

PPRL (version 0.3.8)

CreateEnsembleCLK: Combine multiple independent CLKs using a simple majority rule

Description

Creates multiple CLKs which are combined using a simple majority rule.

Usage

CreateEnsembleCLK(ID, data, password, NumberOfCLK = 1 , k = 20,
  padding = as.integer(c(0)), qgram = as.integer(c(2)),
  lenBloom = 1000)

Value

A data.frame containing IDs and the corresponding ensemble bit vector.

Arguments

ID

A character vector or integer vector containing the IDs of the data.frame.

data

a data.frame containing the data to be encoded. Make sure the input vectors are not factors.

password

a character vector with a password for each column of the input data.frame for the random hashing of the q-grams.

NumberOfCLK

number of independent CLKs to be built.

k

number of bit positions set to one for each bigram.

padding

integer vector (0 or 1) indicating if string padding is to be used on the columns of the input. The padding vector must have the same size as the number of columns of the input data.

qgram

integer vector (1 or 2) indicating whether to split the input strings into bigrams (q = 2) or unigrams (q = 1). The qgram vector must have the same size as the number of columns of the input data.

lenBloom

desired length of the final Bloom filter in bits.

Details

Creates a set number of independent CLKs for each record of the input data.frame and combines them using a simple majority rule. The bit positions \(B[i]\) in the final CLK of the length \(B\) are set to \(B[i] = 1\) if more than half of the independent CLKs bit positions \(B[i]\) have a value of one. Otherwise the final bit position is zero.

References

Kuncheva, L. (2014): Combining Pattern Classifiers: Methods and Algorithms. Wiley.

See Also

CreateBF, CreateCLK, StandardizeString

Examples

Run this code
# Load test data
testFile <- file.path(path.package("PPRL"), "extdata/testdata.csv")
testData <-read.csv(testFile, head = FALSE, sep = "\t",
  colClasses = "character")
if (FALSE) {
# Create Ensemble CLK
EnsembleCLK <- CreateEnsembleCLK(ID = testData$V1,
  data = testData[, c(2, 3, 7, 8)],
  k = 20, padding = c(0, 0, 1, 1),
  q = c(1, 2, 2, 2), l = 1000,
  password = c("HUh4q", "lkjg", "klh", "Klk5"),
  NumberOfCLK = 5)
}

Run the code above in your browser using DataLab