eiCluster(runId,K,minNbrs,compoundIds=c(), dir=".",cutoff=NULL,
distance=getDefaultDist(descriptorType),
conn=defaultConn(dir), W = 1.39564, M=19,L=10,T=30,type="cluster",linkage="single")
eiMakeDb
. If your coming from an older version of eiR, you should
not use this value instead of
specifying r
, d
, and descriptorType
.jarvisPatrick
function in ChemmineR. This list contains the
nearest neighbor matrix along with the similarity matrix. This allows one to quickly try
different cutoff values without having to re-compute the whole similarity matrix each time.
Note that since we are returning similarity values here instead of distance values,
this will only work if the given distance function returns a value between 0 and 1. This is
true of the default funtions.type
is "cluster", returns a clustering.
This will be a vector in which the names are the compound names, and the values are the cluster
labels.
Otherwise, if type
is "matrix", returns a list with the following components:K
neibhbors for a compound, that row will be padded with NAs.K
, and minNbrs
. For each item, it find the K
nearest neighbors
of that item. Normally this requires computing the distance between every pair of items.
However, using Locality Sensative Hashing (LSH), the set of nearst neighbors can be found in
near constant time. Once the nearest neighbor matrix is computed, the algorithm makes one pass
through the items and merges all pairs that have at least minNbrs
neighbors in common. Although not required, it is avisable to specify a cutoff
value. This is the maximum
distance two items can have from each other and still be considered to be neighbors. It is
thus possible for an item to end up with less than K
neighbors if less than K
items are close enough to it. If a cutoff is not specified, it is possible for highly
un-related items to be listed as neighbors of another item simply because nothing else was
nearby. This can lead to items being joined into clusters with which they have no true
connection.
The type
parameter can be used to return a list which can be used to call the
jarvisPatrick
function in ChemmineR directly. The advantage of this is that it will
contain the similarity matrix which can then be used to quickly set different cutoff values
(using trimNeighbors
) whithout having to re-compute the similarity matrix. Note that this
requires that the given distance function return a value between 0 and 1 so it can be converted
to a similarity function.
library(snow)
r<- 50
d<- 40
#initialize
data(sdfsample)
dir=file.path(tempdir(),"cluster")
dir.create(dir)
eiInit(sdfsample,dir=dir)
#create compound db
runId=eiMakeDb(r,d,numSamples=20,dir=dir, cl=makeCluster(1,type="SOCK",outfile=""))
eiCluster(runId,K=5,minNbrs=2,cutoff=0.5,dir=dir)
Run the code above in your browser using DataLab