Version information: Updated for ELKI 0.6.5~20141030
In this tutorial, we want to implement a new outlier detection method. The outlier definition used in this example is to use the standard deviation of the distances to the k nearest neighbors. Inliers are expected to have a low standard deviation, outliers to have a higher standard deviation (note: in reality, it probably is not that easy, but this is good enough for this tutorial).
The two key APIs in ELKI are the Algorithm interface (and the associated abstract classes and specializations) and the OutlierResult classes for output.
(Note: you may be missing the run method. See below.)
Completing the stub
We have two generics in this example. O is the object type. Since this is dependant on the distance function, we cannot make many assumptions. We just need to have a type variable and use it consistently. We will also add a class logger and fill out the getInputTypeRestriction method (which again is determined by the distance function and the k parameter, for the number of neighbors. We also made the constructor public.
Adding the run method
Now we need to implement the main method. Since we have extended AbstractAlgorithm, we actually have three options for this. The exact signature cannot be defined in Java:
We need to implement only one of these signatures, the choice is up to us. The versions with relation will save us some manual work, so we’ll go with these. We’ll create the following stub first, that outlines the general flow. First we initialize the kNN query. Note that the database may choose to use an optimized kNN query here; which is why it needs to know the distance function and value of k in advance. Then we setup a data storage for double values, process the individual elements and finally wrap the result in the expected API. Note that the outlier result API consists of two part: meta data on the score distribution (including minimum and maximum values) and a relation of the actual scores (which essentially is just our data store).
Finally, we fill in the actual outlier detection algorithm:
Adding a parameterizer
Right now, we can invoke the algorithm from Java (albeit a bit tricky), but we also want to be able to use the GUI and command line interface. For this we need to implement Parameterization, namely add an AbstractParameterizer. This is as public static inner class named Parameterizer (otherwise it will not be found!). The stub obtained from extracting the superclass parameterizer is:
We again need to customize this stub slightly: restrict the distance function type, change the return type and override the makeOptions. The improved stub then is:
There is not much left to do. The distance function is parameterized by the super class. We need to add a parameter for k:
Note that we enforce k > 1 in the parameterization API, as the 1 nearest neighbor will usually be the object itself. As you can see, the parameterizer has the purpose of providing a common parameterization interface and the produces the actual Java instance. It connects the UIs to the actual Java code.