Class RangeQueryBenchmark<O extends NumberVector>

  • Type Parameters:
    O - Vector type

    public class RangeQueryBenchmark<O extends NumberVector>
    extends AbstractDistanceBasedApplication<O>
    Benchmarking algorithm that computes a range query for each point. The query points can either come from a separate data source, or from the original database. In the latter case, the database is expected to have an additional, 1-dimensional vector field. For the separate data source, the last dimension will be cut off and used as query radius.

    The simplest data setup clearly is to have an input file:

     x y z label
     1 2 3 Example1
     4 5 6 Example2
     7 8 9 Example3
     
    and a query file:
     x y z radius
     1 2 3 1.2
     4 5 6 3.3
     7 8 9 4.1
     
    where the additional column is the radius.

    Alternatively, if you work with a single file, you need to use the filter command -dbc.filter SplitNumberVectorFilter -split.dims 1,2,3 to split the relation into a 3-dimensional data vector, and 1 dimensional radius vector.

    TODO: alternatively, allow using a fixed radius?

    TODO: use an InputStream instead of a DatabaseConnection for the query set?

    Since:
    0.5.5
    Author:
    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • radius

        protected double radius
        Query radius.
      • queries

        protected DatabaseConnection queries
        The alternate query point source. Optional.
      • sampling

        protected double sampling
        Sampling size.
      • random

        protected RandomFactory random
        Random generator factory
    • Constructor Detail

      • RangeQueryBenchmark

        public RangeQueryBenchmark​(InputStep input,
                                   Distance<? super O> distance,
                                   double radius,
                                   double sampling,
                                   RandomFactory random)
        Constructor.
        Parameters:
        input - Data input
        distance - Distance function to use
        radius - Query radius to use
        sampling - Sampling rate
        random - Random factory
      • RangeQueryBenchmark

        public RangeQueryBenchmark​(InputStep input,
                                   Distance<? super O> distance,
                                   DatabaseConnection queries,
                                   double sampling,
                                   RandomFactory random)
        Constructor.
        Parameters:
        input - Data input
        distance - Distance function to use
        queries - Query data set (may be null!)
        sampling - Sampling rate
        random - Random factory
    • Method Detail

      • logIndexStatistics

        private void logIndexStatistics​(Database database)
        Log index statistics before and after querying.
        Parameters:
        database - Database
      • run

        protected int run​(RangeSearcher<DBIDRef> rangeQuery,
                          Relation<O> relation,
                          double radius,
                          Duration dur,
                          MeanVariance mv)
        Run the algorithm, with constant radius
        Parameters:
        rangeQuery - query to test
        relation - Relation
        radius - Radius
        mv - Mean and variance statistics
        Returns:
        hash code over all results
      • run

        protected int run​(RangeSearcher<DBIDRef> rangeQuery,
                          Relation<O> relation,
                          Relation<NumberVector> radrel,
                          Duration dur,
                          MeanVariance mv)
        Run the algorithm, with separate radius relation
        Parameters:
        rangeQuery - query to test
        relation - Relation
        radrel - Radius relation
        mv - Mean and variance statistics
        Returns:
        hash code over all results
      • run

        protected int run​(RangeSearcher<O> rangeQuery,
                          Relation<O> relation,
                          DatabaseConnection queries,
                          Duration dur,
                          MeanVariance mv)
        Run the algorithm, with a separate query set.
        Parameters:
        rangeQuery - query to test
        relation - Relation
        queries - Queries database connection
        mv - Statistics output
        Returns:
        result hashcode
      • processResult

        protected int processResult​(DoubleDBIDList rres,
                                    MeanVariance mv)
        Method to test a result.
        Parameters:
        rres - Result to process
        mv - Statistics output
        Returns:
        hash code
      • main

        public static void main​(java.lang.String[] args)
        Runs the benchmark
        Parameters:
        args - parameter list according to description