cccd: Class Cover Catch Digraph

Description

Constructs a class cover catch digraph from points or interpoint distance matrices.

Usage

cccd(x = NULL, y = NULL, dxx = NULL, dyx = NULL, method = NULL, 
     k = NA, algorithm = 'cover_tree')
cccd.rw(x=NULL,y=NULL,dxx=NULL,dyx=NULL,method=NULL,m=1,d=2)
cccd.classifier(x,y,dom.method='greedy',proportion=1,...)
cccd.classify(data, C,method=NULL)
cccd.classifier.rw(x,y,m=1,d=2)
cccd.multiclass.classifier(data, classes, dom.method='greedy',proportion=1,...)
cccd.multiclass.classify(data,C,method=NULL)
# S3 method for cccd
plot(x, ..., plot.circles = FALSE, dominate.only = FALSE, 
          D = NULL, vertex.size = 2, vertex.label = NA, 
			 vertex.color = "SkyBlue2", dom.color = "Blue", 
			 ypch = 20, ycex = 1.5, ycol = 2, 
			 use.circle.radii = FALSE, balls = FALSE, 
			 ball.color = gray(0.8), square = FALSE, xlim, ylim)
# S3 method for cccdClassifier
plot(x, ..., xcol=1,ycol=2,xpch=20,ypch=xpch,
                                balls=FALSE,add=FALSE)

Arguments

x,y

the target class and non-target class points. Either x,y or dxx,dyx must be provided. In the case of plot, x is an object of class cccd.

dxx,dyx

interpoint distances (x against x and y against x). If these are not provided they are computed using x and y.

method

the method used for the distance. See dist.

dom.method,proportion

the method used for the domination set computation, and the proportion of points required to dominate. See dominate.

If given, get.knn is used from FNN to approximate the class cover catch graph. Each x covers no more than the k nearest neighbors to it. This will be much faster and use less memory for large data sets, but is an approximation unless k is sufficiently large.

algorithm

See get.knn.

slope of the null hypothesis curve

data

data to be classified

dimension of the data

classes

class labels of the data

cccd object

plot.circles

logical. Plot the circles around the points if TRUE.

dominate.only

logical. Only plot the digraph induced by the dominating set.

a dominating set. Only used if dominate.only is TRUE. If dominate.only is TRUE and D is NULL, then dominate is called.

vertex.size,vertex.color,vertex.label, dom.color

parameters controling the plotting of the vertices. dom.color is the color of the vertices in the dominating set.

balls, ball.color

if balls=TRUE, the cover is plotted as filled balls, with ball.color controling their color. In the cass of cccdClassifier, balls can be "x" or "y" indicating that only one of the balls should be plotted.

ypch,ycex,ycol

parameters for plotting the non-target points.

xpch,xcol

parameters for plotting the first class points.

add

logical. Should the classifier plot be added to an existing plot?

use.circle.radii

logical. Ensure that the circles fit within the plot.

square

logical. Make the plot square.

xlim,ylim

if present, these control the plotting region.

…

arguments passed to cccd or plot.

Value

an object of class igraph. In addition, it contains the attributes:

a vector of radii.

the y vectors.

layout

the x vectors.

In the case of the classifier, the attributes are:

Rx,Ry

vectors of radii.

Cx,Cy

the ball centers.

Details

The class cover catch digraph is a graph with vertices defined by the points of x and edges defined according to the balls \(B(x,d(x,Y))\). There is an edge between vertices \(x_1,x_2\) if \(x_2\in B(x_1,d(x_1,Y))\). If dyx is not given and the method is 'euclidean', then get.knnx is used to find the nearest y to each x. If k is given, only the k nearest neighbors to each point are candidates for covering. Thus the cccd will be approximate, but the computation will (generally) be faster. Since get.knn uses Euclidean distance, these choices will only be valid for this distance metric. Since the graph will tend to be larger than otherwise, the dominating set computation will be slower, so one should trade-off speed of calculation, approximation, and the proportion option to the dominating set (which can make that calculation faster at the cost of returning a subset of the dominating set).

References

D.J. Marchette, "Class Cover Catch Digraphs", Wiley Interdisciplinary Reviews: Computational Statistics, 2, 171-177, 2010.

D.J. Marchette, Random Graphs for Statistical Pattern Recognition, John Wiley & Sons, 2004.

C.E. Priebe, D.J. Marchette, J. DeVinney and D. Socolinsky, "Classification Using Class Cover Catch Digraphs", Journal of Classification, 20, 3-23, 2003.

Examples

Run this code

# NOT RUN {
set.seed(456330)
z <- matrix(runif(1000),ncol=2)
ind <- which(z[,1]<.5 & z[,2]<.5)
x <- z[ind,]
y <- z[-ind,]
g <- cccd(x,y)
C <- cccd.classifier(x,y)
z2 <- matrix(runif(1000),ncol=2)
ind <- which(z2[,1]<.5 & z2[,2]<.5)
cls <- rep(0,nrow(z2))
cls[ind] <- 1
out <- cccd.classify(z2,C)
sum(out != cls)/nrow(z2)
# }
# NOT RUN {
plot(g,plot.circles=TRUE,dominate.only=TRUE)
points(z2,col=2*(1-cls)+1,pch=20)
# }

Run the code above in your browser using DataLab