eiQuery: Perform a query on an embedded database

Description

Finds similar compounds for each query.

Usage

eiQuery(runId,queries,format="sdf",
		dir=".",distance=getDefaultDist(descriptorType),conn=defaultConn(dir),
		asSimilarity=FALSE, K=200, W = 1.39564, M=19,L=10,T=30,lshData=NULL,
		mainIds = readIddb(conn,file.path(dir, Main),sorted=TRUE))

Arguments

runId

The id number identifying a particular set of settings for a database. This is generally the number returned by eiMakeDb. If your coming from an older version of eiR, you should not use this value instead of specifying r, d,refIddb, and descriptorType.

queries

This can be either an SDFset, or a file containg 1 or more query compounds.

format

The format in which the queries are given. Valid values are: "sdf" when queries is either a filename of an sdf file, or and SDFset object; "compound_id" when queries is a list of id numbers; and "name", when queries is a list of compound names, as returned by cid(apset).

dir

The directory where the "data" directory lives. Defaults to the current directory.

distance

The distance function to be used to compute the distance between two descriptors. A default function is provided for "ap" and "fp" descriptors. The Tanimoto function is used by default.

conn

Database connection to use.

asSimilarity

If true, return similarity values instead of distance values. This only works in the given distance function returns values between 0 and 1. This is true for the default atom pair and finger print distance functions.

The number of results to return.

Tunable LSH parameter. See LSHKIT page for details. http://lshkit.sourceforge.net/dd/d2a/mplsh-tune_8cpp.html

Number of hash tables

Number of probes

lshData

A pointer returned by loadLSHData. The LSH data is generally the largest chunk of data that must be held in memory while performing a query. Since it remains the same across queries it makes sense to pre-load the is data once when doing multiple queries.

If this value is NULL the LSH data will be loaded internally and then released before eiQuery returns.

mainIds

A vector of all id numbers in the current database. This is mainly provided as an option here to avoid having to re-read the id list multiple times when executing several queries. If not supplied it will read it in itself.

Value

Returns a data frame with columns 'query', 'target', 'target_ids', and 'distance'. 'query' and 'target' are the compound names and distance is the distance between them, as computed by the given distance function.'target_ids' is the compound id of the target. Query namess are repeated for each matching target found. If asSimilarity is true then instead of a "distance" column there will be a "similarity" column.

Details

This function identifies the database by the r, d, and refIddb parameters. The queries can be given in a few different formats, see the queries parameter for details. The LSH algorithm is used to quickly identify compounds similar to the queries. This function must use a distance function rather than a similarity function. However, if the distance function given returns values between 0 and 1, then the asSimilarity parameter may be used to return similarity values rather than distance values.

Examples

Run this code

library(snow)
   r<- 50
   d<- 40

   #initialize
   data(sdfsample)
   dir=file.path(tempdir(),"query")
   dir.create(dir)
   eiInit(sdfsample,dir=dir)

   #create compound db
   runId=eiMakeDb(r,d,numSamples=20,dir=dir,
      cl=makeCluster(1,type="SOCK",outfile=""))

   #find compounds similar two each query
   results = eiQuery(runId,sdfsample[1:2],K=15,dir=dir)

Run the code above in your browser using DataLab