ELKI 0.8.0 release notes
ELKI 0.8.0 is available on Maven and on our releases web page:
Gradle:
dependencies {
// All the core parts, without visualization:
compile group: 'io.github.elki-project', name: 'elki', version:'0.8.0'
// If you want to use visualization:
compile group: 'io.github.elki-project', name: 'elki-batik-visualization', version:'0.8.0'
}
Maven:
<!-- ELKI core, without visualization -->
<dependency>
<groupId>io.github.elki-project</groupId>
<artifactId>elki</artifactId>
<version>0.8.0</version>
</dependency>
<!-- You only need this dependency if you need visualization -->
<dependency>
<groupId>io.github.elki-project</groupId>
<artifactId>elki-batik-visualization</artifactId>
<version>0.8.0</version>
</dependency>
Please clone https://github.com/elki-project/example-elki-project for a minimal project example.
Upcoming major changes
The next ELKI release will shorten all package names. We will also change the group ID to reflect that the project moved to https://elki-project.github.io/.
Since we will rename all packages, we will also use this opportunity to simplify other class names, such as “DistanceFunction” to “Distance”.
Futher breaking changes include changes to the result hierarchy and metadata management. These are necessary for important new functionality (such as automatic indexing, and garbage collection).
For ELKI 0.9.0, we will likely target Java 17 or 21,
so 0.8.0 is supposedly the last version to support Java 8.
We may begin using the var
feature of Java 9 in cases where it
makes the code more readable.
Thus the next 0.9.0 release will not be backwards compatible at all.
Contributors
This release is brought to you by:
- Erich Schubert
- Robert Gehde
- Andreas Lang (BIRCH and BETULA)
- Erik Thordsen (Intrinsic dimensionality)
- Lars Lenssen (Silhouette clustering)
- Braulio Sanchez (HySortOD, clustering-based outlier detection)
- Alan Mazankiewicz (MCDE and MWP tests)
- Abhishek Sharma (Interestingness measures)
New functionality
Indexing
- Automatic Indexing. If possible, ELKI will automatically add a suitable index to accelerate algorithms.
- Automatic Garbage Collection of unused indexes
- Priority search API
- much improved k-d-tree with additional split heuristics
- Linear AESA (LAESA)
- Vantage point trees (VP-tree)
- Geometric Near-neighbor Access Tree (GNAT, MVP-tree)
Clustering
- Hierarchical clustering additions:
- Rewrite of internal data structures to use merge histories instead of the pointer model of SLINK
- Hierarchial Clustering Around Medoids (HACAM)
- Medoid Linkage
- linear-memory implementation of NN-Chain
- Acceleration with BIRCH
- Acceleration with BETULA
- Conversion of OPTICS results to hierarchical clustering
- K-Means Clustering improvements and additions:
- Hartigan and Wong’s method
- Shallot algorithm
- Yin-Yang algorithm
- k-d-tree filtering k-means
- k-d-tree pruning k-means
- BIRCH acceleration
- BETULA acceleration
- G-Means for selecting the number of clusters
- KMC2 and AFKMC2 initialization
- Spherical k-means
- Accelerated spherical k-means
- Greedy k-center Clustering
- K-Medoids clustering additions:
- EagerPAM
- FasterPAM
- GreedyG initialization
- Gaussian Mixture Modeling improvements:
- Improved extensibility of codebase
- KD-Tree acceleration
- BIRCH acceleration
- BETULA acceleration
- Silhouette clustering:
- PAMSIL algorithm
- PAMMEDSIL algorithm
- FastMSC algorithm
- FasterMSC algorithm
- Cluster evaluation:
- Maximum matching accuracy
- Density-peak Clustering
- Support Vector Clustering
Association Rule Mining
- Improved ECLAT implementation
- Interestingness measures:
- Laplace corrected confidence
- Odds ratio
- Phi Correlation Coefficient
- Sebag-Schonauer
- Yules Q
- Yules Y
Outlier Detection
- Isolation Forest
- HySort outlier detection
- DBSCAN outlier detection (noise points as outliers)
- k-means– outlier detection
- GLOSH outlier scoring
- improved one-class support vector machines
- Support Vector Data Description (SVDD)
Classification
- Much improved support vector machines
Distance and Similarity Functions
- Sqrt-Cosine distance (metric)
Statistics
- MCDE and MWP dependence measures
- ABID: Angle-based intrinsic dimensionality estimation
- Local-PCA intrinsic dimensionality estimation
- TightLID intrinsic dimensionality estimation
Evaluation
- Maximum matching accuracy
- Precision-recall curves (AU-PRC)
- Precision-recall-gain curves (PRGC)
Other Improvements
- Silhouette visualuation
- Kuhn-Munkres and improved versions, for maximum matching
- Numerous unit tests
- Many bug fixes found in testing and reported by users
See also release notes 0.7, release notes 0.7.1, and release notes 0.7.5 for additional release notes of ELKI 0.7.0 to 0.7.5