Data Types Supported by ELKI
See package de.lmu.ifi.dbs.elki.data for a class hierarchy diagram.
Included in ELKI 0.4 are implementations for the following “raw” data types:
- Number vectors:
- Double vectors
- Float vectors
- Integer vectors
- Bit vectors
- Sparse float vectors
- Class labels
- Object labels
- External IDs
As important as data types are the DistanceFunctions. For example, time series are supported by ELKI as regular number vectors, by just using a specialized time series distance such as DTWDistanceFunction on them, for color histograms you can use for example HSBHistogramQuadraticDistanceFunction.
In addition, you will find classes capable of extracting features from data types such as images to obtain supported number vectors. In order to plug in custom data types, you need to implement the following:
- The data type, e.g. derived from FeatureVector
- A parser for the input type to produce instances
- Algorithms or distance functions that can process these data types. Note: many algorithms in ELKI are data type agnostic, they only need to be given an appropriate DistanceFunctions. Others such as k-Means and EM clustering require the NumberVector interface that allows them to compute centroids. This is a restriction of the algorithm, not of ELKI.