Writing a custom distance function
Version information: Updated for ELKI 0.8.0
For many real-world applications, a domain expert may be able to define a domain-specific distance function. For the following, let us assume we are working with 2D data, and the domain expert has decided that an appropriate distance function is dx*dx+abs(dy)
, so taking the difference on the x axis to the square and the y axis linearly.
Basic distance function
Most distances are defined on real-number vectors and return double values. There is a convenient abstract class for this that we can use: AbstractNumberVectorDistance. Let’s start a new class for this, and see what Eclipse generates for us:
Now lets implement the distance method:
We can now already test this distance function! Yes - we do not need to do more. (If you compiled ELKI and do not use a .jar file, you should now have this class in the dropdown menu. Otherwise, you might need to type in the name of the class.
Fine tuning
Now this domain specific distance function makes only sense for 2-dimensional data. So we will now specify this, so that ELKI does not try to use it with higher dimensional relations. For this, we need to override the method getInputTypeRestriction
.
We now also override the method makeOptions
to configure the variable ps
:
With this statement, we specify three requirements for the input data:
- The vectors must be a vector field (i.e. have the same dimensionality)
- The input data must be NumberVectors (of arbirary type: Float, Double, Integer…)
- The dimensionality must be exactly 2.
If this distance function were metrical, we would also override isMetric()
to contain return true
(this distance function however is not metrical).
Making it parameterizable
In order to make the distance function parameterizable, we write some additional lines. You can read more here: Parameterization
This class already satistfied the parameterizable API: it had an implicit public and parameterless constructor, and can thus be instantiated by the UI. However, if we want to actually have some parameters in the class and a different constructor, we need to help the UI with the parameterization. For this, a static
, public
, inner class called Parameterizer
is used.
So here is a more complex variation of Lp norms where we can specify a different “p” for each dimension.
If you want, you can think about when this function will be metrical (for example when all ps are constant and >= 1) and implement isMetric()
accordingly.
However, when you try to select this class in the ELKI UI, you will see this error:
Error instantiating class - no usable public constructor.
So we need to add a Parameterization helper next. The generated stub looks like this:
Make sure that you define the class as public static
. Now you must change the return type to your actual class (MultiLPNorm in this case), so it will now look like this:
In order to setup the parameters, we have to override the configure
method, and add our options there. Parameterization consists of multiple parts:
- Define a public static OptionID for the parameter (so it can be referenced from other classes!)
- Create an option parameter. Here we need a list of doubles, which is parsed by DoubleListParameter.
- Get the options value from the config object using
grab
. If the value is unavailable, an error will automatically reported, since this parameter was not optional. (Do not throw an exception, so multiple errors can be reported!)