Example data sets for ELKI
We are collecting a few example data sets along with a description to try out ELKI. Many of the data sets are artificial test cases that we use in internal unit testing, and are not well suited for benchmarking due to various biases, but mostly meant for use in teaching. Often they work near-perfectly for one algorithm, while another algorithm fails badly and are used to explain strengths and weaknesses of different approaches. They are not meant to even just resemble real data.
The XML files are data sets specifications for use with the data set generator.
Artificial data sets
Data set name | Size | Dim. | Properties | Parameters | Files |
---|---|---|---|---|---|
Vary Density | 150 | 2 | 3 Gaussian clusters with variable density Easy for EM, hard for density clustering |
em.k=3 | CSV, XML |
Mouse | 500 | 2 | 3 Gaussian clusters and noise For comparing EM and kMeans |
em.k=3 kmeans.k=3 |
CSV, XML |
Toy data sets used in LoOP publication
See SNN data sets for a number of synthetic high dimensional artificial data sets.
Real data sets
See multi-view for data sets such as the ALOI data set.
Outlier data sets are hosted at the outlier detection data repository (mirror).
More to come!