Module ojalgo

Class FeatureBasedClusterer

    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      (package private) java.util.function.Function<java.util.Collection<Point>,​Point> centroid()
      Returns a function that computes the centroid of a collection of points.
      <T> java.util.List<java.util.Map<T,​float[]>> cluster​(java.util.Collection<T> input, java.util.function.Function<T,​float[]> extractor)
      Clusters arbitrary items by first extracting their float feature representation.
      (package private) java.util.function.ToDoubleBiFunction<Point,​Point> distance()
      Returns a function that computes the distance between two points.
      (package private) double distance​(Point point1, Point point2)
      Returns the distance between two points.
      (package private) double getThreshold()
      Returns the median distance threshold used for greedy clustering and initialisation.
      (package private) java.util.function.Function<java.util.Collection<Point>,​java.util.List<Point>> initialiser()
      Returns a function that generates an initial set of centroids from the input points.
      (package private) boolean isSquared()
      Returns true if the configured distance measure is squared Euclidean.
      static FeatureBasedClusterer newAutomatic()
      Returns a new automatic clusterer using squared Euclidean distance.
      static FeatureBasedClusterer newAutomatic​(DistanceMeasure measure)
      Returns a new automatic clusterer using the specified distance measure.
      static FeatureBasedClusterer newGreedy​(double threshold)
      Returns a new greedy, single-pass clusterer using squared Euclidean distance and the given threshold.
      static FeatureBasedClusterer newGreedy​(DistanceMeasure measure, double threshold)
      Returns a new greedy, single-pass clusterer using the supplied distance and threshold.
      static FeatureBasedClusterer newKMeans​(int k)
      Returns a new k-means–style clusterer using squared Euclidean distance and the given number of clusters.
      static FeatureBasedClusterer newKMeans​(DistanceMeasure measure, int k)
      Returns a new k-means–style clusterer using the supplied distance measure and number of clusters.
      static FeatureBasedClusterer newSpectral​(int k)
      Returns a new spectral clusterer using squared Euclidean distance and the given number of clusters.
      static FeatureBasedClusterer newSpectral​(DistanceMeasure measure, int k)
      Returns a new spectral clusterer using the supplied distance measure and number of clusters.
      (package private) void setup​(java.util.Collection<Point> input)
      Prepares the internal distance cache for the given input points and distance measure.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

    • Method Detail

      • newAutomatic

        public static FeatureBasedClusterer newAutomatic​(DistanceMeasure measure)
        Returns a new automatic clusterer using the specified distance measure.

        The algorithm:

        1. Extracts features
        2. Caches all pairwise distances
        3. Performs statistical analysis to determine a distance threshold
        4. Performs greedy clustering to get initial centroids
        5. Filters out very small clusters (determining k)
        6. Performs k-means clustering to refine clusters and centroids
        Parameters:
        measure - the distance measure to use
        Returns:
        a new automatic clusterer
      • newGreedy

        public static FeatureBasedClusterer newGreedy​(DistanceMeasure measure,
                                                      double threshold)
        Returns a new greedy, single-pass clusterer using the supplied distance and threshold.

        Each item is assigned to the nearest existing centroid if its distance is <= threshold; otherwise a new cluster is created. The threshold must be in the same units as the chosen distance measure.

        Parameters:
        measure - the distance measure
        threshold - the maximum allowed distance to join an existing cluster
        Returns:
        a new greedy clusterer
      • newGreedy

        public static FeatureBasedClusterer newGreedy​(double threshold)
        Returns a new greedy, single-pass clusterer using squared Euclidean distance and the given threshold.
        Parameters:
        threshold - the maximum allowed distance to join an existing cluster
        Returns:
        a new greedy clusterer
      • newKMeans

        public static FeatureBasedClusterer newKMeans​(DistanceMeasure measure,
                                                      int k)
        Returns a new k-means–style clusterer using the supplied distance measure and number of clusters.
        Parameters:
        measure - the distance function
        k - the number of clusters (k >= 1)
        Returns:
        a new k-means clusterer
      • newKMeans

        public static FeatureBasedClusterer newKMeans​(int k)
        Returns a new k-means–style clusterer using squared Euclidean distance and the given number of clusters.
        Parameters:
        k - the number of clusters (k >= 1)
        Returns:
        a new k-means clusterer
      • newSpectral

        public static FeatureBasedClusterer newSpectral​(DistanceMeasure measure,
                                                        int k)
        Returns a new spectral clusterer using the supplied distance measure and number of clusters.

        Uses a Gaussian kernel and the symmetric normalised Laplacian.

        Parameters:
        measure - the distance measure for the kernel
        k - the number of clusters (k >= 1)
        Returns:
        a new spectral clusterer
      • newSpectral

        public static FeatureBasedClusterer newSpectral​(int k)
        Returns a new spectral clusterer using squared Euclidean distance and the given number of clusters.
        Parameters:
        k - the number of clusters (k >= 1)
        Returns:
        a new spectral clusterer
      • cluster

        public final <T> java.util.List<java.util.Map<T,​float[]>> cluster​(java.util.Collection<T> input,
                                                                                java.util.function.Function<T,​float[]> extractor)
        Clusters arbitrary items by first extracting their float feature representation.

        Each item is wrapped as a Point using the extractor output. Clustering is then performed by ClusteringAlgorithm.cluster(Collection). The result mirrors the internal clusters but maps back to the original items along with their feature vectors.

        Type Parameters:
        T - the item type
        Parameters:
        input - the items to cluster (not null)
        extractor - a function that returns a non-null float[] feature vector for an item
        Returns:
        a list of clusters, each as a map from the original item to its feature vector, sorted by decreasing size
      • centroid

        java.util.function.Function<java.util.Collection<Point>,​Point> centroid()
        Returns a function that computes the centroid of a collection of points.
        Returns:
        centroid function
      • distance

        java.util.function.ToDoubleBiFunction<Point,​Point> distance()
        Returns a function that computes the distance between two points.
        Returns:
        distance function
      • distance

        double distance​(Point point1,
                        Point point2)
        Returns the distance between two points.
        Parameters:
        point1 - first point
        point2 - second point
        Returns:
        distance between point1 and point2
      • getThreshold

        double getThreshold()
        Returns the median distance threshold used for greedy clustering and initialisation.
        Returns:
        median distance threshold
      • initialiser

        java.util.function.Function<java.util.Collection<Point>,​java.util.List<Point>> initialiser()
        Returns a function that generates an initial set of centroids from the input points.
        Returns:
        initialiser function
      • isSquared

        boolean isSquared()
        Returns true if the configured distance measure is squared Euclidean.
        Returns:
        true if squared Euclidean, false otherwise
      • setup

        void setup​(java.util.Collection<Point> input)
        Prepares the internal distance cache for the given input points and distance measure.
        Parameters:
        input - the points to cache distances for