# ELKI

ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures.

See: Description

Algorithms
Package Description
de.lmu.ifi.dbs.elki.algorithm
Algorithms suitable as a task for the KDDTask main routine.
de.lmu.ifi.dbs.elki.algorithm.benchmark
Benchmarking pseudo algorithms.
de.lmu.ifi.dbs.elki.algorithm.classification
Classification algorithms.
de.lmu.ifi.dbs.elki.algorithm.clustering
Clustering algorithms Clustering algorithms are supposed to implement the Algorithm-Interface.
de.lmu.ifi.dbs.elki.algorithm.clustering.affinitypropagation
Affinity Propagation (AP) clustering.
de.lmu.ifi.dbs.elki.algorithm.clustering.biclustering
Biclustering algorithms
de.lmu.ifi.dbs.elki.algorithm.clustering.correlation
Correlation clustering algorithms
de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.cash
Helper classes for the CASH algorithm.
de.lmu.ifi.dbs.elki.algorithm.clustering.em
Expectation-Maximization clustering algorithm.
de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan
Generalized DBSCAN Generalized DBSCAN is an abstraction of the original DBSCAN idea, that allows the use of arbitrary "neighborhood" and "core point" predicates.
de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.parallel
Parallel versions of Generalized DBSCAN.
de.lmu.ifi.dbs.elki.algorithm.clustering.gdbscan.util
Utility classes for specialized DBSCAN implementations.
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical
Hierarchical agglomerative clustering (HAC).
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.birch
BIRCH clustering.
de.lmu.ifi.dbs.elki.algorithm.clustering.hierarchical.extraction
Extraction of partitional clusterings from hierarchical results.
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans
K-means clustering and variations
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.initialization
Initialization strategies for k-means.
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.parallel
Parallelized implementations of k-means.
de.lmu.ifi.dbs.elki.algorithm.clustering.kmeans.quality
Quality measures for k-Means results.
de.lmu.ifi.dbs.elki.algorithm.clustering.meta
Meta clustering algorithms, that get their result from other clusterings or external sources.
de.lmu.ifi.dbs.elki.algorithm.clustering.onedimensional
Clustering algorithms for one-dimensional data.
de.lmu.ifi.dbs.elki.algorithm.clustering.optics
OPTICS family of clustering algorithms.
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace
Axis-parallel subspace clustering algorithms The clustering algorithms in this package are instances of both, projected clustering algorithms or subspace clustering algorithms according to the classical but somewhat obsolete classification schema of clustering algorithms for axis-parallel subspaces.
de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.clique
Helper classes for the CLIQUE algorithm.
de.lmu.ifi.dbs.elki.algorithm.clustering.trivial
Trivial clustering algorithms: all in one, no clusters, label clusterings These methods are mostly useful for providing a reference result in evaluation.
de.lmu.ifi.dbs.elki.algorithm.clustering.uncertain
Clustering algorithms for uncertain data.
de.lmu.ifi.dbs.elki.algorithm.itemsetmining
Algorithms for frequent itemset mining such as APRIORI.
de.lmu.ifi.dbs.elki.algorithm.itemsetmining.associationrules
Association rule mining.
de.lmu.ifi.dbs.elki.algorithm.itemsetmining.associationrules.interest
Association rule interestingness measures.
de.lmu.ifi.dbs.elki.algorithm.outlier
Outlier detection algorithms
de.lmu.ifi.dbs.elki.algorithm.outlier.anglebased
Angle-based outlier detection algorithms.
de.lmu.ifi.dbs.elki.algorithm.outlier.clustering
Clustering based outlier detection.
de.lmu.ifi.dbs.elki.algorithm.outlier.distance
Distance-based outlier detection algorithms, such as DBOutlier and kNN.
de.lmu.ifi.dbs.elki.algorithm.outlier.distance.parallel
Parallel implementations of distance-based outlier detectors.
de.lmu.ifi.dbs.elki.algorithm.outlier.intrinsic
Outlier detection algorithms based on intrinsic dimensionality.
de.lmu.ifi.dbs.elki.algorithm.outlier.lof
LOF family of outlier detection algorithms
de.lmu.ifi.dbs.elki.algorithm.outlier.lof.parallel
Parallelized variants of LOF.
de.lmu.ifi.dbs.elki.algorithm.outlier.meta
Meta outlier detection algorithms: external scores, score rescaling
de.lmu.ifi.dbs.elki.algorithm.outlier.spatial
Spatial outlier detection algorithms
de.lmu.ifi.dbs.elki.algorithm.outlier.spatial.neighborhood
Spatial outlier neighborhood classes
de.lmu.ifi.dbs.elki.algorithm.outlier.spatial.neighborhood.weighted
Weighted Neighborhood definitions
de.lmu.ifi.dbs.elki.algorithm.outlier.subspace
Subspace outlier detection methods Methods that detect outliers in subspaces (projections) of the data set.
de.lmu.ifi.dbs.elki.algorithm.outlier.svm
Support-Vector-Machines for outlier detection.
de.lmu.ifi.dbs.elki.algorithm.outlier.trivial
Trivial outlier detection algorithms: no outliers, all outliers, label outliers.
de.lmu.ifi.dbs.elki.algorithm.projection
de.lmu.ifi.dbs.elki.algorithm.statistics
Statistical analysis algorithms.
de.lmu.ifi.dbs.elki.algorithm.timeseries
Algorithms for change point detection in time series.
Databases and Index Structures
Package Description
de.lmu.ifi.dbs.elki.database
de.lmu.ifi.dbs.elki.database.datastore
General data store layer API (along the lines of Map<DBID, T> - use everywhere!)
de.lmu.ifi.dbs.elki.database.datastore.memory
Memory data store implementation for ELKI.
de.lmu.ifi.dbs.elki.database.ids
Database object identification and ID group handling API.
de.lmu.ifi.dbs.elki.database.ids.integer
Integer-based DBID implementation -- do not use directly - always use DBIDUtil.
de.lmu.ifi.dbs.elki.database.query
Database queries - computing distances, neighbors, similarities - API and general documentation Introduction The database query API is designed around the concept of prepared statements.
de.lmu.ifi.dbs.elki.database.query.distance
Prepared queries for distances
de.lmu.ifi.dbs.elki.database.query.knn
Prepared queries for k nearest neighbor (kNN) queries
de.lmu.ifi.dbs.elki.database.query.range
Prepared queries for ε-range queries, that return all objects within the radius ε
de.lmu.ifi.dbs.elki.database.query.rknn
Prepared queries for reverse k nearest neighbor (rkNN) queries
de.lmu.ifi.dbs.elki.database.query.similarity
Prepared queries for similarity functions
de.lmu.ifi.dbs.elki.database.relation
Relations, materialized and virtual (views)
de.lmu.ifi.dbs.elki.datasource
Data normalization (and reconstitution) of data sets
de.lmu.ifi.dbs.elki.datasource.bundle
Object bundles - exchange container for multi-represented objects
de.lmu.ifi.dbs.elki.datasource.filter
Data filtering, in particular for normalization and projection
de.lmu.ifi.dbs.elki.datasource.filter.cleaning
Filters for data cleaning.
de.lmu.ifi.dbs.elki.datasource.filter.normalization
Data normalization
de.lmu.ifi.dbs.elki.datasource.filter.normalization.columnwise
Normalizations operating on columns / variates; where each column is treated independently.
de.lmu.ifi.dbs.elki.datasource.filter.normalization.instancewise
Instancewise normalization, where each instance is normalized independently.
de.lmu.ifi.dbs.elki.datasource.filter.selection
Filters for selecting and sorting data to process.
de.lmu.ifi.dbs.elki.datasource.filter.transform
Data space transformations
de.lmu.ifi.dbs.elki.datasource.filter.typeconversions
Filters to perform data type conversions.
de.lmu.ifi.dbs.elki.datasource.parser
Parsers for different file formats and data types The general use-case for any parser is to create objects out of an InputStream (e.g. by reading a data file).
de.lmu.ifi.dbs.elki.index
Index structure implementations
de.lmu.ifi.dbs.elki.index.distancematrix
Precomputed distance matrix.
de.lmu.ifi.dbs.elki.index.idistance
iDistance is a distance based indexing technique, using a reference points embedding.
de.lmu.ifi.dbs.elki.index.invertedlist
Indexes using inverted lists.
de.lmu.ifi.dbs.elki.index.lsh
Locality Sensitive Hashing
de.lmu.ifi.dbs.elki.index.lsh.hashfamilies
Hash function families for LSH
de.lmu.ifi.dbs.elki.index.lsh.hashfunctions
Hash functions for LSH
de.lmu.ifi.dbs.elki.index.preprocessed
Index structure based on preprocessors
de.lmu.ifi.dbs.elki.index.preprocessed.fastoptics
Preprocessed index used by the FastOPTICS algorithm.
de.lmu.ifi.dbs.elki.index.preprocessed.knn
Indexes providing KNN and rKNN data.
de.lmu.ifi.dbs.elki.index.preprocessed.localpca
Index using a preprocessed local PCA
de.lmu.ifi.dbs.elki.index.preprocessed.preference
Indexes storing preference vectors
de.lmu.ifi.dbs.elki.index.preprocessed.snn
Indexes providing nearest neighbor sets
de.lmu.ifi.dbs.elki.index.projected
Projected indexes for data
de.lmu.ifi.dbs.elki.index.tree
Tree-based index structures
de.lmu.ifi.dbs.elki.index.tree.metrical
Tree-based index structures for metrical vector spaces
de.lmu.ifi.dbs.elki.index.tree.metrical.covertree
Cover-tree variations.
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants
M-Tree and variants
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.mktrees
Metrical index structures based on the concepts of the M-Tree supporting processing of reverse k nearest neighbor queries by using the k-nn distances of the entries
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.mktrees.mkapp
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.mktrees.mkcop
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.mktrees.mkmax
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.mktrees.mktab
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.mtree
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.query
Classes for performing queries (knn, range, ...) on metrical trees
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.strategies.insert
Insertion (choose path) strategies of nodes in an M-Tree (and variants)
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.strategies.split
Splitting strategies of nodes in an M-Tree (and variants)
de.lmu.ifi.dbs.elki.index.tree.metrical.mtreevariants.strategies.split.distribution
Entry distsribution strategies of nodes in an M-Tree (and variants).
de.lmu.ifi.dbs.elki.index.tree.spatial
Tree-based index structures for spatial indexing
de.lmu.ifi.dbs.elki.index.tree.spatial.kd
K-d-tree and variants
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants
R*-Tree and variants
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.deliclu
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.flat
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.query
Queries on the R-Tree family of indexes: kNN and range queries
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rdknn
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.bulk
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.insert
Insertion strategies for R-Trees
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.overflow
Overflow treatment strategies for R-Trees
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.reinsert
Reinsertion strategies for R-Trees
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.strategies.split
Splitting strategies for R-Trees
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.util
Utilities for R*-Tree and variants
de.lmu.ifi.dbs.elki.index.vafile
Vector Approximation File
de.lmu.ifi.dbs.elki.persistent
Persistent data management
Datatypes and Distance Functions
Package Description
de.lmu.ifi.dbs.elki.data
Basic classes for different data types, database object types and label types
de.lmu.ifi.dbs.elki.data.model
Cluster models classes for various algorithms
de.lmu.ifi.dbs.elki.data.projection
Data projections
de.lmu.ifi.dbs.elki.data.projection.random
Random projection families
de.lmu.ifi.dbs.elki.data.spatial
Spatial data types - interfaces and utilities
de.lmu.ifi.dbs.elki.data.synthetic.bymodel
Generator using a distribution model specified in an XML configuration file GeneratorXMLSpec is a standalone application that loads an XML specification file and generates a synthetic data set according to the specifications given.
de.lmu.ifi.dbs.elki.data.type
Data type information, also used for type restrictions
de.lmu.ifi.dbs.elki.data.uncertain
Uncertain data objects.
de.lmu.ifi.dbs.elki.data.uncertain.uncertainifier
Classes to generate uncertain objects from existing certain data.
de.lmu.ifi.dbs.elki.distance.distancefunction
Distance functions for use within ELKI.
Distance functions deriving distances from, e.g., similarity measures
de.lmu.ifi.dbs.elki.distance.distancefunction.colorhistogram
Distance functions using correlations
de.lmu.ifi.dbs.elki.distance.distancefunction.correlation
Distance functions using correlations
de.lmu.ifi.dbs.elki.distance.distancefunction.external
Distance functions using external data sources
de.lmu.ifi.dbs.elki.distance.distancefunction.geo
Geographic (earth) distance functions
de.lmu.ifi.dbs.elki.distance.distancefunction.histogram
Distance functions for one-dimensional histograms.
de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski
Minkowski space Lp norms such as the popular Euclidean and Manhattan distances.
de.lmu.ifi.dbs.elki.distance.distancefunction.probabilistic
Distance from probability theory, mostly divergences such as K-L-divergence, J-divergence, F-divergence, χ²-divergence, etc.
de.lmu.ifi.dbs.elki.distance.distancefunction.set
Distance functions for binary and set type data.
de.lmu.ifi.dbs.elki.distance.distancefunction.strings
Distance functions for strings
de.lmu.ifi.dbs.elki.distance.distancefunction.subspace
Distance functions based on subspaces
de.lmu.ifi.dbs.elki.distance.distancefunction.timeseries
Distance functions designed for time series Note that some regular distance functions (e.g., Euclidean) are also used on time series.
de.lmu.ifi.dbs.elki.distance.similarityfunction
Similarity functions
de.lmu.ifi.dbs.elki.distance.similarityfunction.cluster
Similarity measures for comparing clusters.
de.lmu.ifi.dbs.elki.distance.similarityfunction.kernel
Kernel functions.
Evaluation
Package Description
de.lmu.ifi.dbs.elki.evaluation
Functionality for the evaluation of algorithms.
de.lmu.ifi.dbs.elki.evaluation.classification
Evaluation of classification algorithms.
de.lmu.ifi.dbs.elki.evaluation.classification.holdout
Holdout and cross-validation strategies for evaluating classifiers.
de.lmu.ifi.dbs.elki.evaluation.clustering
Evaluation of clustering results
de.lmu.ifi.dbs.elki.evaluation.clustering.extractor
Classes to extract clusterings from hierarchical clustering.
de.lmu.ifi.dbs.elki.evaluation.clustering.internal
Internal evaluation measures for clusterings.
de.lmu.ifi.dbs.elki.evaluation.clustering.pairsegments
Pair-segment analysis of multiple clusterings
de.lmu.ifi.dbs.elki.evaluation.index
Simple index evaluation methods
de.lmu.ifi.dbs.elki.evaluation.outlier
Evaluate an outlier score using a misclassification based cost model
de.lmu.ifi.dbs.elki.evaluation.scores
Evaluation of rankings and scorings
Adapter classes for ranking and scoring measures.
de.lmu.ifi.dbs.elki.evaluation.similaritymatrix
Render a distance matrix to visualize a clustering-distance-combination.
GUI and Visualization
Package Description
de.lmu.ifi.dbs.elki.gui
Graphical User Interfaces for ELKI
de.lmu.ifi.dbs.elki.gui.configurator
Configurator components
de.lmu.ifi.dbs.elki.gui.icons
Icons for ELKI GUI.
de.lmu.ifi.dbs.elki.gui.minigui
A very simple UI to build ELKI command lines
de.lmu.ifi.dbs.elki.gui.multistep
Multi-step GUI for ELKI
de.lmu.ifi.dbs.elki.gui.multistep.panels
Panels for the multi-step GUI
de.lmu.ifi.dbs.elki.gui.util
Utility classes for GUIs (e.g. a class to display a logging panel)
de.lmu.ifi.dbs.elki.visualization
Visualization package of ELKI
de.lmu.ifi.dbs.elki.visualization.batikutil
Commonly used functionality useful for Apache Batik
de.lmu.ifi.dbs.elki.visualization.colors
Color scheme handling for ELKI
de.lmu.ifi.dbs.elki.visualization.css
Managing CSS styles / classes
de.lmu.ifi.dbs.elki.visualization.gui
Package to provide a visualization GUI
de.lmu.ifi.dbs.elki.visualization.gui.detail
Classes for managing a detail view
de.lmu.ifi.dbs.elki.visualization.gui.overview
Classes for managing the overview plot
de.lmu.ifi.dbs.elki.visualization.opticsplot
Code for drawing OPTICS plots
de.lmu.ifi.dbs.elki.visualization.parallel3d
3DPC: 3D parallel coordinate plot visualization for ELKI.
de.lmu.ifi.dbs.elki.visualization.parallel3d.layout
Layouting algorithms for 3D parallel coordinate plots.
de.lmu.ifi.dbs.elki.visualization.parallel3d.util
Utility classes (primarily rendering utilities).
de.lmu.ifi.dbs.elki.visualization.projections
Visualization projections
de.lmu.ifi.dbs.elki.visualization.projector
Projectors are responsible for finding appropriate projections for data relations
de.lmu.ifi.dbs.elki.visualization.savedialog
Save dialog for SVG plots
de.lmu.ifi.dbs.elki.visualization.style
Style management for ELKI visualizations
de.lmu.ifi.dbs.elki.visualization.style.lines
Generate line styles for plotting in CSS
de.lmu.ifi.dbs.elki.visualization.style.marker
Draw plot markers
de.lmu.ifi.dbs.elki.visualization.svg
Base SVG functionality (generation, markers, thumbnails, export, ...)
de.lmu.ifi.dbs.elki.visualization.visualizers
Visualizers for various results
de.lmu.ifi.dbs.elki.visualization.visualizers.actions
Action-only "visualizers" that only produce menu entries.
de.lmu.ifi.dbs.elki.visualization.visualizers.histogram
Visualizers based on 1D projected histograms
de.lmu.ifi.dbs.elki.visualization.visualizers.optics
Visualizers that do work on OPTICS plots
de.lmu.ifi.dbs.elki.visualization.visualizers.pairsegments
Visualizers for inspecting cluster differences using pair counting segments
de.lmu.ifi.dbs.elki.visualization.visualizers.parallel
Visualizers based on parallel coordinates
de.lmu.ifi.dbs.elki.visualization.visualizers.parallel.cluster
Visualizers for clustering results based on parallel coordinates
de.lmu.ifi.dbs.elki.visualization.visualizers.parallel.index
Visualizers for index structure based on parallel coordinates
de.lmu.ifi.dbs.elki.visualization.visualizers.parallel.selection
Visualizers for object selection based on parallel projections
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot
Visualizers based on scatterplots
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot.cluster
Visualizers for clustering results based on 2D projections
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot.density
Visualizers for data set density in a scatterplot projection
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot.index
Visualizers for index structures based on 2D projections
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot.outlier
Visualizers for outlier scores based on 2D projections
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot.selection
Visualizers for object selection based on 2D projections
de.lmu.ifi.dbs.elki.visualization.visualizers.scatterplot.uncertain
Visualizers for uncertain data.
de.lmu.ifi.dbs.elki.visualization.visualizers.thumbs
Thumbnail "Visualizers" (that take care of refreshing thumbnails)
de.lmu.ifi.dbs.elki.visualization.visualizers.visunproj
Visualizers that do not use a particular projection
Utilities and Miscellaneous
Package Description
de.lmu.ifi.dbs.elki
ELKI framework "Environment for Developing KDD-Applications Supported by Index-Structures".
de.lmu.ifi.dbs.elki.application
Base classes for standalone applications.
de.lmu.ifi.dbs.elki.application.cache
Utility applications for the persistence layer such as distance cache builders.
de.lmu.ifi.dbs.elki.application.experiments
Packaged experiments to make them easy to reproduce.
de.lmu.ifi.dbs.elki.application.greedyensemble
Greedy ensembles for outlier detection.
de.lmu.ifi.dbs.elki.application.internal
Internal utilities for development
de.lmu.ifi.dbs.elki.logging
Logging facility for controlling logging behavior of the complete framework.
de.lmu.ifi.dbs.elki.logging.progress
Progress status objects (for UI)
de.lmu.ifi.dbs.elki.logging.statistics
Classes for logging various statistics.
de.lmu.ifi.dbs.elki.math
Mathematical operations and utilities used throughout the framework
de.lmu.ifi.dbs.elki.math.geodesy
Functions for computing on the sphere / earth.
de.lmu.ifi.dbs.elki.math.geometry
Algorithms from computational geometry
de.lmu.ifi.dbs.elki.math.linearalgebra
The linear algebra package provides classes and computational methods for operations on matrices and vectors.
de.lmu.ifi.dbs.elki.math.linearalgebra.fitting
Function to numerically fit a function (such as a Gaussian distribution) to given data.
de.lmu.ifi.dbs.elki.math.linearalgebra.pca
Principal Component Analysis (PCA) and Eigenvector processing
de.lmu.ifi.dbs.elki.math.linearalgebra.pca.filter
Filter eigenvectors based on their eigenvalues.
de.lmu.ifi.dbs.elki.math.linearalgebra.pca.weightfunctions
Weight functions used in weighted PCA via WeightedCovarianceMatrixBuilder
de.lmu.ifi.dbs.elki.math.scales
Scales handling for plotting
de.lmu.ifi.dbs.elki.math.spacefillingcurves
Space filling curves
de.lmu.ifi.dbs.elki.math.statistics
Statistical tests and methods
de.lmu.ifi.dbs.elki.math.statistics.dependence
Statistical measures of dependence, such as correlation
de.lmu.ifi.dbs.elki.math.statistics.distribution
Standard distributions, with random generation functionalities
de.lmu.ifi.dbs.elki.math.statistics.distribution.estimator
Estimators for statistical distributions.
de.lmu.ifi.dbs.elki.math.statistics.distribution.estimator.meta
Meta estimators: estimators that do not actually estimate themselves, but instead use other estimators, e.g. on a trimmed data set, or as an ensemble.
de.lmu.ifi.dbs.elki.math.statistics.intrinsicdimensionality
Methods for estimating the intrinsic dimensionality.
de.lmu.ifi.dbs.elki.math.statistics.kernelfunctions
Kernel functions from statistics.
de.lmu.ifi.dbs.elki.math.statistics.tests
Statistical tests
de.lmu.ifi.dbs.elki.parallel
Parallel processing core for ELKI.
de.lmu.ifi.dbs.elki.parallel.processor
Processor API of ELKI, and some essential shared processors.
de.lmu.ifi.dbs.elki.parallel.variables
Variables are instantiated for each thread, and allow passing values from one processor to another within the same thread.
de.lmu.ifi.dbs.elki.result
Result types, representation and handling
de.lmu.ifi.dbs.elki.result.outlier
Outlier result classes
de.lmu.ifi.dbs.elki.result.textwriter
Text serialization (CSV, Gnuplot, Console, ...)
de.lmu.ifi.dbs.elki.result.textwriter.naming
Naming schemes for clusters (for output when an algorithm doesn't generate cluster names).
de.lmu.ifi.dbs.elki.result.textwriter.writers
Serialization handlers for individual data types.
de.lmu.ifi.dbs.elki.utilities
Utility and helper classes - commonly used data structures, output formatting, exceptions, ...
de.lmu.ifi.dbs.elki.utilities.datastructures
Basic memory structures such as heaps and object hierarchies
de.lmu.ifi.dbs.elki.utilities.datastructures.arraylike
Common API for accessing objects that are "array-like", including lists, numerical vectors, database vectors and arrays.
de.lmu.ifi.dbs.elki.utilities.datastructures.arrays
Utilities for arrays: advanced sorting for primitvie arrays
de.lmu.ifi.dbs.elki.utilities.datastructures.heap
Heap structures and variations such as bounded priority heaps
de.lmu.ifi.dbs.elki.utilities.datastructures.hierarchy
Delegate implementation of a hierarchy
de.lmu.ifi.dbs.elki.utilities.datastructures.histogram
Classes for computing histograms This package contains two families of histograms.
de.lmu.ifi.dbs.elki.utilities.datastructures.iterator
ELKI Iterator API ELKI uses a custom iterator API instead of the usual Iterator classes (the "Java Collections API").
de.lmu.ifi.dbs.elki.utilities.datastructures.range
Ranges of values.
de.lmu.ifi.dbs.elki.utilities.datastructures.unionfind
Union-find data structures.
de.lmu.ifi.dbs.elki.utilities.documentation
Documentation utilities: Annotations for Title, Description, Reference
de.lmu.ifi.dbs.elki.utilities.ensemble
Utility classes for simple ensembles
de.lmu.ifi.dbs.elki.utilities.exceptions
Exception classes and common exception messages.
de.lmu.ifi.dbs.elki.utilities.io
Utility classes for input/output.
de.lmu.ifi.dbs.elki.utilities.optionhandling
Parameter handling and option descriptions.
de.lmu.ifi.dbs.elki.utilities.optionhandling.constraints
Constraints allow to restrict possible values for parameters
de.lmu.ifi.dbs.elki.utilities.optionhandling.parameterization
Configuration managers See the de.lmu.ifi.dbs.elki.utilities.optionhandling package for documentation!
de.lmu.ifi.dbs.elki.utilities.optionhandling.parameters
Classes for various typed parameters See the de.lmu.ifi.dbs.elki.utilities.optionhandling package for documentation!
de.lmu.ifi.dbs.elki.utilities.pairs
Pairs utility classes A number of commonly needed primitive pairs are the following: IntIntPair storing two int values DoubleIntPair storing one double and one int value.
de.lmu.ifi.dbs.elki.utilities.random
Random number generation.
de.lmu.ifi.dbs.elki.utilities.referencepoints
Package containing strategies to obtain reference points Shared code for various algorithms that use reference points
de.lmu.ifi.dbs.elki.utilities.scaling
Scaling functions: linear, logarithmic, gamma, clipping, ...
de.lmu.ifi.dbs.elki.utilities.scaling.outlier
Scaling of outlier scores, that require a statistical analysis of the occurring values
de.lmu.ifi.dbs.elki.utilities.xml
XML and XHTML utilities
de.lmu.ifi.dbs.elki.workflow
Work flow packages, e.g., following the usual KDD model.
Tutorial Code and Examples
Package Description
tutorial.clustering
Classes from the tutorial on implementing a custom k-means variation
tutorial.distancefunction
Classes from the tutorial on implementing distance functions
tutorial.javaapi
Examples how to invoke ELKI from Java.
tutorial.outlier
Tutorials on implementing outlier detection methods in ELKI.

ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures.

ELKI is a generic framework for a broad range of KDD-applications and their development. For background, contact-information, and contributors see https://elki-project.github.io/.

This is the documentation for version 0.7.5, published as:
Erich Schubert and Arthur Zimek:
ELKI: A large open-source library for data analysis ELKI Release 0.7.5 "Heidelberg"
CoRR arXiv 1902.03616

## Getting started

The ELKI website contains additional documentation. A Tutorial exported is included with this documentation and a good place to start.

### Invocation

To use the KDD-Framework we recommend an executable .jar-file: elki.jar. Since release 0.3 it will by default invoke a minimalistic GUI called MiniGUI when you call java -jar elki.jar. For command line use (for example for batch processing and scripted operation), you can get a description of usage by calling java -jar elki.jar KDDCLIApplication -h.

The MiniGUI can also serve as a utility for building command lines, as it will print the full command line to the log window.

For more information on using files and available formats as data input see de.lmu.ifi.dbs.elki.datasource.parser. ELKI uses a whitespace separated vector format by default, but there also is a parser for ARFF files included that can read most ARFF files (mixing sparse and dense vectors is currently not allowed).

An extensive list of parameters can be browsed sorted by class or sorted by option ID.

Some examples of completely parameterized calls for different algorithms are described at example calls.

A list of related publications, giving details on many implemented algorithms, can be found in the class article references list.

## Workflow - Where Do Which Objects Go?

The database connection manages reading of input files or databases and provides a Database-Object - including index structures - as a virtual database to the KDDTask. The KDDTask applies a specified algorithm on this database and collects the result from the algorithm. Finally, KDDTask hands on the obtained result to a ResultHandler. The default-handler is ResultWriter, writing the result to STDOUT or, if specified, into a file.

### Database and indexing layer

The database and indexing layer is a key component of ELKI. This is not just a storage for double[], as with many other frameworks. It can store various types of objects, and the integrated index structures provide access to fast distance, similarity, kNN, RkNN and range query methods for a variety of distance functions.

The standard flow for initializing a database is as depicted here:

The standard stream-based data sources such as FileBasedDatabaseConnection will open the stream, feed the contents through a Parser to obtain an initial MultipleObjectsBundle. This is a temporary container for the data, which can then be modified by arbitrary ObjectFilters.
In the end, the MultipleObjectsBundle is bulk-inserted into a Database, which will then invoke its IndexFactorys to add Index instances to the appropriate relations.

When a request for a distance, similarity, kNN, RkNN or range query is received by the database, it queries all indexes if they have support for this query. If so, an optimized query is returned, otherwise a linear scan query can be returned unless DatabaseQuery.HINT_OPTIMIZED_ONLY was given.

For this optimization to work, you should be using the proper APIs of the Database interface or QueryUtil helper where possible, instead of initializing low level classes such as an explicit linear scan query.

For efficiency, try to instantiate the query only once per algorithm run, and avoid running the optimization step for every object.

## How to make use of this framework

### Extension

To provide new applications one is simply to implement the specified interfaces. There are interfaces for a broad range of targets of development. Compare the tree of interfaces to get an overview concerning the provided interfaces.

A good place to get started is to have a look at some of the existing algorithms, and see how they are implemented. For example the DummyAlgorithm while it does not produce any result, will teach you how to perform k-nearest-neighbor queries properly. It does however have a hard dependency on the Euclidean distance and the datatypes supported by it. In order to support arbitrary distance functions, extend the class AbstractDistanceBasedAlgorithm instead. This is another simple example, this time for obtaining a class parameter.

Visit the ELKI Wiki, which has a growing amount of documentation. You are also welcome to contribute, of course!

### Parameterization API

ELKI is designed for command-line, GUI and Java operation. For command-line and GUI, an extensive help functionality is provided along with input assistance. Therefore, you should also support the parameterizable API. The requirements are quite different from regular Java constructors, and cannot be expressed in terms of a Java API.

For useful error reporting and input assistance in the GUI we need to have more extensive typing than Java uses (for example we might need numerical constraints) and we also want to be able to report more than one error at a time. In ELKI 0.4, much of the parameterization was refactored to static helper classes usually found as a public static class Parameterizer and subclasses of AbstractParameterizer.

Keep the complexity of Parameterizer classes and constructors invoked by these classes low, since these may be heavily used during the parameterization step. Postpone any extensive initialization to the main algorithm invocation step!