Class IntersectionSimilarity<T>

java.lang.Object
org.apache.commons.text.similarity.IntersectionSimilarity<T>
Type Parameters:
T - the type of the elements extracted from the character sequence
All Implemented Interfaces:
SimilarityScore<IntersectionResult>

public class IntersectionSimilarity<T> extends Object implements SimilarityScore<IntersectionResult>
Measures the intersection of two sets created from a pair of character sequences.

It is assumed that the type T correctly conforms to the requirements for storage within a Set or HashMap. Ideally the type is immutable and implements Object.equals(Object) and Object.hashCode().

Since:
1.7
See Also:
  • Field Details

  • Constructor Details

    • IntersectionSimilarity

      public IntersectionSimilarity(Function<CharSequence,Collection<T>> converter)
      Create a new intersection similarity using the provided converter.

      If the converter returns a Set then the intersection result will not include duplicates. Any other Collection is used to produce a result that will include duplicates in the intersect and union.

      Parameters:
      converter - the converter used to create the elements from the characters
      Throws:
      IllegalArgumentException - if the converter is null
  • Method Details

    • getIntersection

      private static <T> int getIntersection(Set<T> setA, Set<T> setB)
      Computes the intersection between two sets. This is the count of all the elements that are within both sets.
      Type Parameters:
      T - the type of the elements in the set
      Parameters:
      setA - the set A
      setB - the set B
      Returns:
      The intersection
    • apply

      public IntersectionResult apply(CharSequence left, CharSequence right)
      Calculates the intersection of two character sequences passed as input.
      Specified by:
      apply in interface SimilarityScore<T>
      Parameters:
      left - first character sequence
      right - second character sequence
      Returns:
      The intersection result
      Throws:
      IllegalArgumentException - if either input sequence is null
    • getIntersection

      private int getIntersection(IntersectionSimilarity<T>.TinyBag bagA, IntersectionSimilarity<T>.TinyBag bagB)
      Computes the intersection between two bags. This is the sum of the minimum count of each element that is within both sets.
      Parameters:
      bagA - the bag A
      bagB - the bag B
      Returns:
      The intersection
    • toBag

      private IntersectionSimilarity<T>.TinyBag toBag(Collection<T> objects)
      Converts the collection to a bag. The bag will contain the count of each element in the collection.
      Parameters:
      objects - the objects
      Returns:
      The bag