Package elki.clustering.kmedoids
Class CLARA<V>
- java.lang.Object
-
- elki.clustering.kmedoids.PAM<V>
-
- elki.clustering.kmedoids.CLARA<V>
-
- Type Parameters:
V
- Data type
- All Implemented Interfaces:
Algorithm
,ClusteringAlgorithm<Clustering<MedoidModel>>
,KMedoidsClustering<V>
@Reference(authors="L. Kaufman, P. J. Rousseeuw",title="Clustering Large Data Sets",booktitle="Pattern Recognition in Practice",url="https://doi.org/10.1016/B978-0-444-87877-9.50039-X",bibkey="doi:10.1016/B978-0-444-87877-9.50039-X") @Reference(authors="L. Kaufman, P. J. Rousseeuw",title="Clustering Large Applications (Program CLARA)",booktitle="Finding Groups in Data: An Introduction to Cluster Analysis",url="https://doi.org/10.1002/9780470316801.ch3",bibkey="doi:10.1002/9780470316801.ch3") public class CLARA<V> extends PAM<V>
Clustering Large Applications (CLARA) is a clustering method for large data sets based on PAM, partitioning around medoids (PAM
) based on sampling.TODO: use a triangular distance matrix, rather than a hash-map based cache, for a bit better performance and less memory.
Reference:
L. Kaufman, P. J. Rousseeuw
Clustering Large Data Sets
Pattern Recognition in PracticeL. Kaufman, P. J. Rousseeuw
Clustering Large Applications (Program CLARA)
Finding Groups in Data: An Introduction to Cluster Analysis- Since:
- 0.7.0
- Author:
- Erich Schubert
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
CLARA.CachedDistanceQuery<V>
Cached distance query.static class
CLARA.Par<V>
Parameterization class.-
Nested classes/interfaces inherited from class elki.clustering.kmedoids.PAM
PAM.Instance
-
Nested classes/interfaces inherited from interface elki.Algorithm
Algorithm.Utils
-
-
Field Summary
Fields Modifier and Type Field Description (package private) boolean
keepmed
Keep the previous medoids in the sample (see page 145).private static Logging
LOG
Class logger.(package private) int
numsamples
Number of samples to draw (i.e. iterations).(package private) RandomFactory
random
Random factory for initialization.(package private) double
sampling
Sampling rate.-
Fields inherited from class elki.clustering.kmedoids.PAM
distance, initializer, k, maxiter
-
-
Constructor Summary
Constructors Constructor Description CLARA(Distance<? super V> distance, int k, int maxiter, KMedoidsInitialization<V> initializer, int numsamples, double sampling, boolean keepmed, RandomFactory random)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static double
assignRemainingToNearestCluster(ArrayDBIDs means, DBIDs ids, DBIDs rids, WritableIntegerDataStore assignment, DistanceQuery<?> distQ)
Returns a list of clusters.(package private) static DBIDs
randomSample(DBIDs ids, int samplesize, java.util.Random rnd, DBIDs previous)
Draw a random sample of the desired size.Clustering<MedoidModel>
run(Relation<V> relation)
Run k-medoids clustering.Clustering<MedoidModel>
run(Relation<V> relation, int k, DistanceQuery<? super V> distQ)
Run k-medoids clustering with a given distance query.
Not a very elegant API, but needed for some types of nested k-medoids.-
Methods inherited from class elki.clustering.kmedoids.PAM
getInputTypeRestriction, getLogger, initialMedoids, wrapResult
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface elki.clustering.ClusteringAlgorithm
autorun
-
-
-
-
Field Detail
-
LOG
private static final Logging LOG
Class logger.
-
sampling
double sampling
Sampling rate. If less than 1, it is considered to be a relative value.
-
numsamples
int numsamples
Number of samples to draw (i.e. iterations).
-
keepmed
boolean keepmed
Keep the previous medoids in the sample (see page 145).
-
random
RandomFactory random
Random factory for initialization.
-
-
Constructor Detail
-
CLARA
public CLARA(Distance<? super V> distance, int k, int maxiter, KMedoidsInitialization<V> initializer, int numsamples, double sampling, boolean keepmed, RandomFactory random)
Constructor.- Parameters:
distance
- Distance function to usek
- Number of clusters to producemaxiter
- Maximum number of iterationsinitializer
- Initialization functionnumsamples
- Number of samples (sampling iterations)sampling
- Sampling rate (absolute or relative)keepmed
- Keep the previous medoids in the next samplerandom
- Random generator
-
-
Method Detail
-
run
public Clustering<MedoidModel> run(Relation<V> relation)
Description copied from interface:KMedoidsClustering
Run k-medoids clustering.
-
run
public Clustering<MedoidModel> run(Relation<V> relation, int k, DistanceQuery<? super V> distQ)
Description copied from interface:KMedoidsClustering
Run k-medoids clustering with a given distance query.
Not a very elegant API, but needed for some types of nested k-medoids.
-
randomSample
static DBIDs randomSample(DBIDs ids, int samplesize, java.util.Random rnd, DBIDs previous)
Draw a random sample of the desired size.- Parameters:
ids
- IDs to sample fromsamplesize
- Sample sizernd
- Random generatorprevious
- Previous medoids to always include in the sample.- Returns:
- Sample
-
assignRemainingToNearestCluster
protected static double assignRemainingToNearestCluster(ArrayDBIDs means, DBIDs ids, DBIDs rids, WritableIntegerDataStore assignment, DistanceQuery<?> distQ)
Returns a list of clusters. The kth cluster contains the ids of those FeatureVectors, that are nearest to the kth mean.- Parameters:
means
- Object centroidsids
- Object idsrids
- Sample that was already assignedassignment
- cluster assignmentdistQ
- distance query- Returns:
- Sum of distances.
-
-