coBC: Train the Co-bagging model

Description

Builds and trains a model to predict the label of instances, according to Co-bagging algorithm.

Usage

coBC(x, y, bclassif = bClassifOneNN(), dist = "matrix", N = 3, min.amount = ceiling(length(which(is.na(y))) * 0.3), u = 100, max.iter = 50)

Arguments

A object that can be coerced as matrix. This object have various interpretations depending on the value set in dist argument. See dist argument.

A vector with the labels of training instances. In this vector the unlabeled instances are specified with the value NA.

bclassif

Base classifier specification. Default is bClassifOneNN(). For defining new base classifiers see bClassif.

dist

Distance information. Valid options are:

"matrix": this string indicates that x is a distance matrix.
string: the name of a distance method available in proxy package. In this case x is interpreted as a matrix of instances.
function: a function defined by the user that computes the distance between two vectors. This function is called passing the vectors in the firsts two arguments. If the function have others arguments, those arguments must be have default values. In this case x is interpreted as a matrix of instances.

The number of classifiers used as committee members. All these classifiers are defined using the description provided by bClassif. Default is 3.

min.amount

Minimum number of unlabeled instances to stop the training process. When the size of unlabeled training instances reaches this number the self-labeling process is stopped. Default is 0.3 * .

Number of unlabeled instances in the pool. Default is 100.

max.iter

Maximum number of iterations to execute in the self-labeling process. Default is 50.

Value

The trained model stored in a list with the following named values:

References

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, pages 92–100, New York, NY, USA, 1998. ACM. ISBN 1-58113-057-0. doi: 10.1145/279943.279962.

Examples

Run this code

# This example is part of CoBC demo.
# Use demo(CoBC) to see all the examples.

## Load Wine data set
data(wine)

x <- wine[, -14] # instances without classes
y <- wine[, 14] # the classes
x <- scale(x) # scale the attributes

## Prepare data
set.seed(20)
# Use 50% of instances for training
tra.idx <- sample(x = length(y), size = ceiling(length(y) * 0.5))
xtrain <- x[tra.idx,] # training instances
ytrain <- y[tra.idx]  # classes of training instances
# Use 70% of train instances as unlabeled set
tra.na.idx <- sample(x = length(tra.idx), size = ceiling(length(tra.idx) * 0.7))
ytrain[tra.na.idx] <- NA # remove class information of unlabeled instances

# Use the other 50% of instances for inductive testing
tst.idx <- setdiff(1:length(y), tra.idx)
xitest <- x[tst.idx,] # testing instances
yitest <- y[tst.idx] # classes of testing instances

## Example: Using the Euclidean distance in proxy package.
m <- coBC(xtrain, ytrain, dist = "Euclidean")
pred <- predict(m, xitest)
caret::confusionMatrix(table(pred, yitest))

Run the code above in your browser using DataLab