Data Types Supported by ELKI
See package elki.data for examples.
Included in ELKI 0.8.0 are implementations for the following “raw” data types:
- Number vectors:
- Double vectors
- Float vectors
- Integer vectors
- Bit vectors
- Sparse float vectors
- Class labels
- Object labels
- External IDs
- Geo-data:
As important as data types are the Distances. For example, time series are supported by ELKI as regular number vectors, by just using a specialized time series distance such as DTWDistance on them, for color histograms you can use for example HSBHistogramQuadraticDistance.
In addition, you will find classes capable of extracting features from data types such as images to obtain supported number vectors. In order to plug in custom data types, you need to implement the following:
- The data type, e.g. derived from FeatureVector
- A parser for the input type to produce instances
- Algorithms or distance functions that can process these data types. Note: many algorithms in ELKI are data type agnostic, they only need to be given an appropriate Distance. Others such as k-Means and EM clustering require the NumberVector interface that allows them to compute centroids. This is a restriction of the algorithm, not of ELKI.