Package elki.database.query
Introduction
The database query API is designed around the concept of prepared statements.When working with index structures, preprocessors, caches and external data, computing a distance or a neighborhood is not as simple as running a constant time function. Some functions may only be defined on a subset of the data, others can be computed much more efficiently by performing a batch operation. When plenty of memory is available, caching can be faster than recomputing distances all the time. And often there will be more than one way of computing the same data (for example by using an index or doing a linear scan).
Usually, these operations are invoked very often. Even deciding which method to use at every iteration can prove quite costly when the number of iterations becomes large. Therefore the goal is to "optimize" once, then invoke the same handler cheaply. This can be achieved by using "prepared statements" as this would be called in a traditional RDBMS context.
Prepared queries in ELKI
Prepared statements in ELKI are currently available for:- Distance queries:
DistanceQuery
- Similarity queries:
SimilarityQuery
- kNN (k-nearest-neighbors) queries:
KNNSearcher
- ε-range queries:
RangeSearcher
- rkNN (reverse k-nearest-neighbors) queries:
RKNNSearcher
PrioritySearcher
that allow incremental search.
Obtaining query objects
The general process of obtaining a query is to retrieve it using the QueryBuilder:QueryBuilder.distanceQuery()
QueryBuilder.similarityQuery()
QueryBuilder.kNNByObject()
QueryBuilder.rangeByObject()
QueryBuilder.rKNNByObject()
QueryBuilder.priorityByObject()
The query can then be evaluated on objects as needed.
Optimizer hints
In order to assist the database layer to choose the most suitable implementation, one should also provide so called "hints" as available. In general, any object could be a "hint" to the database layer (for extensibility), but the following are commonly used:- An Integer as maximum value of "k" used in kNN and rkNN queries (since a preprocessor or index might only support a certain fixed maximum value)
- A maximum distance used in range queries
QueryBuilder.exactOnly()
to exclude approximate answersQueryBuilder.optimizedOnly()
to disallow linear scansQueryBuilder.cheapOnly()
to disallow expensive optimizations, since the query will only be used onceQueryBuilder.noCache()
to disallow retrieving a cache class
Full example:
// Get a kNN query with maxk = 10 KNNSearcher<V, DoubleDistance> knnQuery = relation.getKNNQuery(EuclideanDistance.STATIC, 10); // run a 10NN query for each point, discarding the results for(DBID id : database) { knnQuery.getKNNForDBID(id, 10); }
-
Interface Summary Interface Description DistanceSimilarityQuery<O> Interface that is a combination of distance and a similarity function.LinearScanQuery Marker interface for linear scan (slow, non-accelerated) queries.PrioritySearcher<O> Distance priority-based searcher.QueryOptimizer Interface to automatically add indexes to a database when no suitable indexes have been found. -
Class Summary Class Description DisableQueryOptimizer Dummy implementation to disable automatic optimization.DisableQueryOptimizer.Par Parameterization class.EmpiricalQueryOptimizer Class to automatically add indexes to a database.ExactPrioritySearcher<O> Priority searcher that refines all objects to their exact distances, using another priority searcher inside to provide candidates.QueryBuilder<O> Class to build a query.WrappedPrioritySearchDBIDByLookup<O> Find nearest neighbors by querying with the original object.WrappedPrioritySearchDBIDByLookup.Linear<O> Linear scan searcher.