eiQuery(runId,queries,format="sdf",
dir=".",distance=getDefaultDist(descriptorType),conn=defaultConn(dir),
asSimilarity=FALSE, K=200, W = 1.39564, M=19,L=10,T=30,lshData=NULL,
mainIds = readIddb(conn,file.path(dir, Main),sorted=TRUE))
eiMakeDb
. If your coming from an older version of eiR, you should
not use this value instead of
specifying r
, d
,refIddb
, and descriptorType
.queries
is either a filename of an sdf file, or and SDFset object;
"compound_id" when queries
is a list of id numbers; and "name", when queries
is a list of compound names, as returned by cid(apset)
.loadLSHData
. The LSH data is generally the largest
chunk of data that must be held in memory while performing a query. Since it
remains the same across queries it makes sense to pre-load the is data once when
doing multiple queries. If this value is NULL
the LSH data will be loaded internally and then released before
eiQuery
returns.
asSimilarity
is true then instead of a "distance"
column there will be a "similarity" column.r
, d
, and
refIddb
parameters. The queries can be given in a few
different formats, see the queries
parameter for details.
The LSH algorithm is used to quickly identify compounds similar to the
queries.
This function must use a distance function rather than a similarity function.
However, if the distance function given returns values between 0 and 1, then
the asSimilarity
parameter may be used to return similarity values rather
than distance values.eiInit
eiMakeDb
eiPerformanceTest
library(snow)
r<- 50
d<- 40
#initialize
data(sdfsample)
dir=file.path(tempdir(),"query")
dir.create(dir)
eiInit(sdfsample,dir=dir)
#create compound db
runId=eiMakeDb(r,d,numSamples=20,dir=dir,
cl=makeCluster(1,type="SOCK",outfile=""))
#find compounds similar two each query
results = eiQuery(runId,sdfsample[1:2],K=15,dir=dir)
Run the code above in your browser using DataLab