Class CBLOF<O extends NumberVector>

  • Type Parameters:
    O - the type of data objects handled by this algorithm
    All Implemented Interfaces:
    Algorithm, OutlierAlgorithm

    @Title("Discovering cluster-based local outliers")
    @Reference(authors="Z. He, X. Xu, S. Deng",
               title="Discovering cluster-based local outliers",
               booktitle="Pattern Recognition Letters 24(9-10)",
    public class CBLOF<O extends NumberVector>
    extends java.lang.Object
    implements OutlierAlgorithm
    Cluster-based local outlier factor (CBLOF).


    Z. He, X. Xu, S. Deng
    Discovering cluster-based local outliers
    Pattern Recognition Letters 24(9-10)

    Implementation note: this algorithm is hard to implement in a generic fashion, as to support arbitrary clustering algorithms and distances, because it is not trivial to ensure both the clustering algorithm and the outlier method use compatible data types and distances.

    Patrick Kostjens
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • alpha

        protected double alpha
        The ratio of the size that separates the large clusters from the small clusters. The clusters are ordered descending by size and are taken until the specified ratio of the data is included. For example: a ratio of 0.9 indicates that the large clusters should cover at least 90% of the data points.
      • beta

        protected double beta
        The minimal ratio between two consecutive clusters (when ordered descending by size) at which the boundary between the large and small clusters is set. For example: a ratio of 3 means that the clusters are separated between cluster i and (i+1) (where (i+1) is the first cluster smaller than i) when cluster i is at least 3 times bigger than (i+1).
    • Constructor Detail

      • CBLOF

        public CBLOF​(NumberVectorDistance<? super O> distance,
                     ClusteringAlgorithm<Clustering<MeanModel>> clusteringAlgorithm,
                     double alpha,
                     double beta)
        distance - the neighborhood distance function
        clusteringAlgorithm - the clustering algorithm
        alpha - the ratio of the data that should be included in the large clusters
        beta - the ratio of the sizes of the clusters at the boundary between the large and the small clusters
    • Method Detail

      • run

        public OutlierResult run​(Database database,
                                 Relation<O> relation)
        Run CBLOF.
        database - Database to run on
        relation - Relation to use for CBLOF computation
        Outlier result
      • getClusterBoundary

        private int getClusterBoundary​(Relation<O> relation,
                                       java.util.List<? extends Cluster<MeanModel>> clusters)
        Compute the boundary index separating the large cluster from the small cluster.
        relation - Data to process
        clusters - All clusters that were found
        Index of boundary between large and small cluster.
      • computeCBLOFs

        private void computeCBLOFs​(Relation<O> relation,
                                   WritableDoubleDataStore cblofs,
                                   DoubleMinMax cblofMinMax,
                                   java.util.List<? extends Cluster<MeanModel>> largeClusters,
                                   java.util.List<? extends Cluster<MeanModel>> smallClusters)
        Compute the CBLOF scores for all the data.
        relation - Data to process
        cblofs - CBLOF scores
        cblofMinMax - Minimum/maximum score tracker
        largeClusters - Large clusters output
        smallClusters - Small clusters output
      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Type restriction