## Class CBLOF<O extends NumberVector>

• elki.outlier.clustering.CBLOF<O>
• Type Parameters:
O - the type of data objects handled by this algorithm
Algorithm, OutlierAlgorithm

@Title("Discovering cluster-based local outliers")
@Reference(authors="Z. He, X. Xu, S. Deng",
title="Discovering cluster-based local outliers",
booktitle="Pattern Recognition Letters 24(9-10)",
url="https://doi.org/10.1016/S0167-8655(03)00003-5",
bibkey="DBLP:journals/prl/HeXD03")
public class CBLOF<O extends NumberVector>
implements OutlierAlgorithm
Cluster-based local outlier factor (CBLOF).

Reference:

Z. He, X. Xu, S. Deng
Discovering cluster-based local outliers
Pattern Recognition Letters 24(9-10)

Implementation note: this algorithm is hard to implement in a generic fashion, as to support arbitrary clustering algorithms and distances, because it is not trivial to ensure both the clustering algorithm and the outlier method use compatible data types and distances.

Since:
0.7.5
Author:
Patrick Kostjens

protected double alpha
The ratio of the size that separates the large clusters from the small clusters.
protected double beta
The minimal ratio between two consecutive clusters (when ordered descending by size) at which the boundary between the large and small clusters is set.
protected ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm
The clustering algorithm to use.
protected NumberVectorDistance<? super O> distance
Distance function used.
private static Logging LOG
The logger for this class.
CBLOF​(NumberVectorDistance<? super O> distance, ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm, double alpha, double beta)
Constructor.
private void computeCBLOFs​(Relation<O> relation, WritableDoubleDataStore cblofs, DoubleMinMax cblofMinMax, java.util.List<? extends Cluster<MeanModel>> largeClusters, java.util.List<? extends Cluster<MeanModel>> smallClusters)
Compute the CBLOF scores for all the data.
private double computeLargeClusterCBLOF​(O obj, NumberVectorDistance<? super O> distance, NumberVector clusterMean, Cluster<MeanModel> cluster)
private double computeSmallClusterCBLOF​(O obj, NumberVectorDistance<? super O> distance, java.util.List<NumberVector> largeClusterMeans, Cluster<MeanModel> cluster)
private int getClusterBoundary​(Relation<O> relation, java.util.List<? extends Cluster<MeanModel>> clusters)
Compute the boundary index separating the large cluster from the small cluster.
TypeInformation[] getInputTypeRestriction()
Get the input type restriction used for negotiating the data query.
OutlierResult run​(Database database, Relation<O> relation)
Run CBLOF.
private void storeCBLOFScore​(WritableDoubleDataStore cblofs, DoubleMinMax cblofMinMax, double cblof, DBIDIter iter)
• #### LOG

private static final Logging LOG
The logger for this class.
• #### distance

protected NumberVectorDistance<? super O extends NumberVector> distance
Distance function used.
• #### clusteringAlgorithm

protected ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm
The clustering algorithm to use.
• #### alpha

protected double alpha
The ratio of the size that separates the large clusters from the small clusters. The clusters are ordered descending by size and are taken until the specified ratio of the data is included. For example: a ratio of 0.9 indicates that the large clusters should cover at least 90% of the data points.
• #### beta

protected double beta
The minimal ratio between two consecutive clusters (when ordered descending by size) at which the boundary between the large and small clusters is set. For example: a ratio of 3 means that the clusters are separated between cluster i and (i+1) (where (i+1) is the first cluster smaller than i) when cluster i is at least 3 times bigger than (i+1).
• #### CBLOF

public CBLOF​(NumberVectorDistance<? super O> distance,
ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm,
double alpha,
double beta)
Constructor.
Parameters:
distance - the neighborhood distance function
clusteringAlgorithm - the clustering algorithm
alpha - the ratio of the data that should be included in the large clusters
beta - the ratio of the sizes of the clusters at the boundary between the large and the small clusters
• #### run

public OutlierResult run​(Database database,
Relation<O> relation)
Run CBLOF.
Parameters:
database - Database to run on
relation - Relation to use for CBLOF computation
Returns:
Outlier result
• #### getClusterBoundary

private int getClusterBoundary​(Relation<O> relation,
java.util.List<? extends Cluster<MeanModel>> clusters)
Compute the boundary index separating the large cluster from the small cluster.
Parameters:
relation - Data to process
clusters - All clusters that were found
Returns:
Index of boundary between large and small cluster.
• #### computeCBLOFs

private void computeCBLOFs​(Relation<O> relation,
WritableDoubleDataStore cblofs,
DoubleMinMax cblofMinMax,
java.util.List<? extends Cluster<MeanModel>> largeClusters,
java.util.List<? extends Cluster<MeanModel>> smallClusters)
Compute the CBLOF scores for all the data.
Parameters:
relation - Data to process
cblofs - CBLOF scores
cblofMinMax - Minimum/maximum score tracker
largeClusters - Large clusters output
smallClusters - Small clusters output
• #### storeCBLOFScore

private void storeCBLOFScore​(WritableDoubleDataStore cblofs,
DoubleMinMax cblofMinMax,
double cblof,
DBIDIter iter)
• #### computeSmallClusterCBLOF

private double computeSmallClusterCBLOF​(O obj,
NumberVectorDistance<? super O> distance,
java.util.List<NumberVector> largeClusterMeans,
Cluster<MeanModel> cluster)
• #### computeLargeClusterCBLOF

private double computeLargeClusterCBLOF​(O obj,
NumberVectorDistance<? super O> distance,
NumberVector clusterMean,
Cluster<MeanModel> cluster)
• #### getInputTypeRestriction

public TypeInformation[] getInputTypeRestriction()
Description copied from interface: Algorithm
Get the input type restriction used for negotiating the data query.
Specified by:
getInputTypeRestriction in interface Algorithm
Returns:
Type restriction