Class COF<O>

  • Type Parameters:
    O - Object type
    All Implemented Interfaces:
    Algorithm, OutlierAlgorithm

    @Title("COF: Connectivity-based Outlier Factor")
    @Reference(authors="J. Tang, Z. Chen, A. W. C. Fu, D. W. Cheung",
               title="Enhancing effectiveness of outlier detections for low density patterns",
               booktitle="In Advances in Knowledge Discovery and Data Mining",
    public class COF<O>
    extends java.lang.Object
    implements OutlierAlgorithm
    Connectivity-based Outlier Factor (COF).


    J. Tang, Z. Chen, A. W. C. Fu, D. W. Cheung
    Enhancing effectiveness of outlier detections for low density patterns.
    Advances in Knowledge Discovery and Data Mining.

    Erich Schubert
    • Field Detail

      • LOG

        private static final Logging LOG
        The logger for this class.
      • distance

        protected Distance<? super O> distance
        Distance function used.
      • k

        protected int k
        The number of neighbors to query (including the query point!)
    • Constructor Detail

      • COF

        public COF​(Distance<? super O> distance,
                   int k)
        distance - the neighborhood distance function
        k - the number of neighbors to use for comparison (excluding the query point)
    • Method Detail

      • run

        public OutlierResult run​(Relation<O> relation)
        Runs the COF algorithm on the given database.
        relation - Data to process
        COF outlier result
      • computeAverageChainingDistances

        protected void computeAverageChainingDistances​(KNNSearcher<DBIDRef> knnq,
                                                       DistanceQuery<O> dq,
                                                       DBIDs ids,
                                                       WritableDoubleDataStore acds)
        Computes the average chaining distance, the average length of a path through the given set of points to each target. The authors of COF decided to approximate this value using a weighted mean that assumes every object is reached from the previous point (but actually every point could be best reachable from the first, in which case this does not make much sense.)

        TODO: can we accelerate this by using the kNN of the neighbors?

        knnq - KNN query
        dq - Distance query
        ids - IDs to process
        acds - Storage for average chaining distances
      • computeCOFScores

        private void computeCOFScores​(KNNSearcher<DBIDRef> knnq,
                                      DBIDs ids,
                                      DoubleDataStore acds,
                                      WritableDoubleDataStore cofs,
                                      DoubleMinMax cofminmax)
        Compute Connectivity outlier factors.
        knnq - KNN query
        ids - IDs to process
        acds - Average chaining distances
        cofs - Connectivity outlier factor storage
        cofminmax - Score minimum/maximum tracker
      • getInputTypeRestriction

        public TypeInformation[] getInputTypeRestriction()
        Description copied from interface: Algorithm
        Get the input type restriction used for negotiating the data query.
        Specified by:
        getInputTypeRestriction in interface Algorithm
        Type restriction