Package elki.database.query

Database queries - computing distances, neighbors, similarities - API and general documentation.


The database query API is designed around the concept of prepared statements.

When working with index structures, preprocessors, caches and external data, computing a distance or a neighborhood is not as simple as running a constant time function. Some functions may only be defined on a subset of the data, others can be computed much more efficiently by performing a batch operation. When plenty of memory is available, caching can be faster than recomputing distances all the time. And often there will be more than one way of computing the same data (for example by using an index or doing a linear scan).

Usually, these operations are invoked very often. Even deciding which method to use at every iteration can prove quite costly when the number of iterations becomes large. Therefore the goal is to "optimize" once, then invoke the same handler cheaply. This can be achieved by using "prepared statements" as this would be called in a traditional RDBMS context.

Prepared queries in ELKI

Prepared statements in ELKI are currently available for: with a quite similar API. In addition, there are the more complicated distance priority searchers: PrioritySearcher that allow incremental search.

Obtaining query objects

The general process of obtaining a query is to retrieve it using the QueryBuilder: as appropriate. See the query class links above for the detailed API. Avoid calling this method within a loop construct!
The query can then be evaluated on objects as needed.

Optimizer hints

In order to assist the database layer to choose the most suitable implementation, one should also provide so called "hints" as available. In general, any object could be a "hint" to the database layer (for extensibility), but the following are commonly used: Please set these hints appropriately, since this can effect your algorithms performance!

Full example:

 // Get a kNN query with maxk = 10
 KNNSearcher<V, DoubleDistance> knnQuery = relation.getKNNQuery(EuclideanDistance.STATIC, 10);
 // run a 10NN query for each point, discarding the results
 for(DBID id : database) {
   knnQuery.getKNNForDBID(id, 10);