Synthetic Data for Shared-Nearest-Neighbors
These data sets were originally created for the publication:
M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.
All sizes are derived from the 640 dimensional version by keeping the first n dimensions.
Data Generator Specifications
These data sets were generated with the data generator included in ELKI (although using an older version of ELKI, that for example used a different random number generator), using the following XML data specifications:
Then only the first 10,20,… dimensions were retained to produce the subsets of each dimensionality.
Simplified versions of the all-relevant data set:
The following versions (not used in the article) of the all-relevant data set have been simplified by scaling the cluster standard deviations, thus making the clusters easier separable and easier to index: