Release Notes for ELKI 0.4
Release 0.4 marks another big milestone in ELKI development, but also highlights why we still do not use a 1.x version number: things still change at a large scale, so expect to be code that was written for ELKI 0.3 to require a lot of work to work with 0.4. The scope of ELKI is still widening, and many of the APIs need to grow with these requirements. For example to support specialized spatial algorithm, the database layer now needs to have solid support for multi-relational operation. In order to have the database layer fully exploit index structures, the query API needs to support features that you might know from traditional databases: optimizer hints and “prepared queries”.
Release goals
Main goals for this release were:
- Multi-index support
to allow for example comparison of indexes or combined indexes - Multi-relation support
to allow the development of multi-relational algorithms such as spatial outlier detection methods - Multi-output
to have output to various sources such as text files, visualization, geo applications and web browser interfaces - Geo mining functionality
- Better Java API
so far, users were expected to use the command line and GUI interfaces. However, there has been a lot of interest in using the implementation directly from Java. This release tries to allow having both a traditional Java API as well as the information needed for user-assisted parameterization and dynamic UI.
Changes
Performance:
- Specializations to doubles led to an approximately 1.5x speedup compared to the previous release for many typical situations. This can be attributed to the cost of boxing and unboxing and the this way increased memory management cost, and emphasized why you shouldn’t be Benchmarking ELKI against a non-generalized implementation.
Global changes:
- Indexing: Multi-index support - databases can now have more than one index
- Database: multi-relational database API
- Database: Database query objects (base for a query optimization layer)
- Parameterization: Improved Java API by moving parameterization into helper classes
- Algorithms: TypeInformation to match data sources and input type restrictions
- Generics: Many java generics became obsolete by the multi-relational database change
Package level changes:
- Preprocessors converted to just another type of index
- Normalizations and Meta-Parsers become Input Filters
- Parsers, Filters and Databases exchange data using ObjectBundles
- Some functionality of DatabaseConnection (such as class label index) moved to filters
- Persistence: Cache has been converted to a nested PageFile, allowing for arbitrary combinations, including multi-level caching.
- Indexing: Major refactoring to split the tree structure from the index use of the tree.
Extensions added:
- Score unification:
H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek:
Interpreting and Unifying Outlier Scores.
In: Proc. 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ, 2011
preprint - Spatial outlier detection:
Elke Achtert, Ahmed Hettab, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek:
Spatial Outlier Detection: Data, Algorithms, Visualizations.
12th International Symposium on Spatial and Temporal Databases (SSTD), Minneapolis, MN, 2011 - Additional outlier detection algorithms
- Many new visualizations
- Ad-hoc layouter for visualizations