
Last chance! 50% off unlimited learning
Sale ends in
*.builder
). Each of these returns a
function that can be used to classify points in two dimensions. The algorithm used can be judged from the first three letters. Thus
the kde_bel
function uses the kernel density estimate (kde), the
knn_bel
function uses the kernel density estimate together with
information on the Nearest Neighbours, the jit_bel
function
uses jittering of the point in the neighbourhood. Finally, the
cor_bel
function uses the kde but includes a factor for
self-correction.
These generated functions (return values) are meant to be passed to
the ensemble
function to build an ensemble.
kde_bel.builder(labs, test, train, options = list(coef = 0.90))
knn_bel.builder(labs, test, train, options = list(k = 3, p = FALSE,
dist.type = c('euclidean', 'absolute', 'mahal'), out = c('var', 'cv'),
coef = 0.90))
jit_bel.builder(labs, test, train, options = list(k = 3, p = FALSE, s =
5, dist.type = c('euclidean', 'absolute', 'mahal'), out = c('var',
'cv'), coef = 0.90))
train
P
P
var
) or
the coefficient of variation(cv
)? Also see the Details section below.ensemble
function.Alternately, 2-D projected data may directly be passed to the
classifier function returned, in which case, a matrix of dimensions
(Number of Classes) x (length(test)) is returned. Each column sums to
1, and represents the partial assignment of that point to each of the
classes. The rows are named after the class names, while the columns
are named after the test points. Ignorance is represented by the
special symbol 'Inf' and is the last class in the matrix. The kde_bel.builder
returns a classifier that simply evaluates
the kernel density estimate of each class on each point, and
classifies that point to that class which has the maximum density on
it.
The knn_bel.builder
returns a classifier that tries to locate
k
(or p*length(train)
) nearest neighbours of each of the
points in the test set. It then evaluates the kernel density estimate
of each class in the training set on each of these nearest neighbours,
and at each of the testing points. With argument var
, the
variance of the set of density values, centered at the density value
at the testing point itself, is taken as a measure of that point
belonging to this class. With argument cv
, the coefficient of
variation is used instead, and for the mean, one uses the density
value on the point itself. Generally, the var
classifier has
higher accuracy.
The jit_bel.builder
works very similar to the
knn_bel.builder
classifier, but instead uses the nearest
neighbour information to determine a point "neighbourhood". The test
points are then jittered in this neighbourhood, and on these fake
points the kernel density is evaluated. The var
and cv
work here as they work in the knn_bel.builder
classifier.
##Setting Up
data(cancer)
table(cancer$V2)
colnames(cancer)[1:2] <- c('id', 'type')
cancer.d <- as.matrix(cancer[,3:32])
labs <- cancer$type
test_size <- floor(0.15*nrow(cancer.d))
train <- sample(1:nrow(cancer.d), size = nrow(cancer.d) - test_size)
test <- which(!(1:569 %in% train))
truelabs = labs[test]
projectron <- function(A) cancer.d %*% A
seed <- .Random.seed
F <- projectron(basis_random(30))
##Simple Density Classification
kdebel <- kde_bel.builder(labs = labs[train], test = test, train = train)
x1 <- kdebel(F)
predicted1 <- apply(x1, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted1)
##Density Classification Using Nearest Neighbor Information
knnbel <- knn_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, p = FALSE, dist.type = 'euclidean', out = 'var', coef
= 0.90))
x2 <- knnbel(F)
predicted2 <- apply(x2, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted2)
##Same as above but now using the Coefficient of Variation for Classification
knnbel2 <- knn_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, p = FALSE, dist.type = 'euclidean', out = 'cv', coef =
0.90))
x3 <- knnbel2(F)
predicted3 <- apply(x3, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted3)
##Density Classification Using Jitter & NN Information
jitbel <- jit_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, s = 2, p = FALSE, dist.type = 'euclidean', out =
'var', coef = 0.90))
x4 <- jitbel(F)
predicted4 <- apply(x4, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted4)
##Same as above but now using the Coefficient of Variation for Classification
jitbel2 <- jit_bel.builder(labs = labs[train], test = test, train =
train, options = list(k = 3, p = FALSE, dist.type = 'euclidean', out =
'cv', s = 2, coef = 0.90))
x5 <- jitbel2(F)
predicted5 <- apply(x5, MARGIN = 2, FUN = function(x) names(which.max(x)))
table(truelabs, predicted5)
Run the code above in your browser using DataLab