Class CommonTermsQuery

java.lang.Object
org.apache.lucene.search.Query
org.apache.lucene.queries.CommonTermsQuery

public class CommonTermsQuery extends Query
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

  • Field Details

    • terms

      protected final List<Term> terms
    • maxTermFrequency

      protected final float maxTermFrequency
    • lowFreqOccur

      protected final BooleanClause.Occur lowFreqOccur
    • highFreqOccur

      protected final BooleanClause.Occur highFreqOccur
    • lowFreqBoost

      protected float lowFreqBoost
    • highFreqBoost

      protected float highFreqBoost
    • lowFreqMinNrShouldMatch

      protected float lowFreqMinNrShouldMatch
    • highFreqMinNrShouldMatch

      protected float highFreqMinNrShouldMatch
  • Constructor Details

  • Method Details

    • add

      public void add(Term term)
      Adds a term to the CommonTermsQuery
      Parameters:
      term - the term to add
    • rewrite

      public Query rewrite(IndexSearcher indexSearcher) throws IOException
      Description copied from class: Query
      Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.

      Callers are expected to call rewrite multiple times if necessary, until the rewritten query is the same as the original query.

      The rewrite process may be able to make use of IndexSearcher's executor and be executed in parallel if the executor is provided.

      Overrides:
      rewrite in class Query
      Throws:
      IOException
      See Also:
    • visit

      public void visit(QueryVisitor visitor)
      Description copied from class: Query
      Recurse through the query tree, visiting any child queries.
      Specified by:
      visit in class Query
      Parameters:
      visitor - a QueryVisitor to be called by each query in the tree
    • calcLowFreqMinimumNumberShouldMatch

      protected int calcLowFreqMinimumNumberShouldMatch(int numOptional)
    • calcHighFreqMinimumNumberShouldMatch

      protected int calcHighFreqMinimumNumberShouldMatch(int numOptional)
    • minNrShouldMatch

      private final int minNrShouldMatch(float minNrShouldMatch, int numOptional)
    • buildQuery

      protected Query buildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)
    • collectTermStates

      public void collectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) throws IOException
      Throws:
      IOException
    • setLowFreqMinimumNumberShouldMatch

      public void setLowFreqMinimumNumberShouldMatch(float min)
      Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

      By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

      Parameters:
      min - the number of optional clauses that must match
    • getLowFreqMinimumNumberShouldMatch

      public float getLowFreqMinimumNumberShouldMatch()
      Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
    • setHighFreqMinimumNumberShouldMatch

      public void setHighFreqMinimumNumberShouldMatch(float min)
      Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

      By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

      Parameters:
      min - the number of optional clauses that must match
    • getHighFreqMinimumNumberShouldMatch

      public float getHighFreqMinimumNumberShouldMatch()
      Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
    • getTerms

      public List<Term> getTerms()
      Gets the list of terms.
    • getMaxTermFrequency

      public float getMaxTermFrequency()
      Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
    • getLowFreqOccur

      public BooleanClause.Occur getLowFreqOccur()
      Gets the BooleanClause.Occur used for low frequency terms.
    • getHighFreqOccur

      public BooleanClause.Occur getHighFreqOccur()
      Gets the BooleanClause.Occur used for high frequency terms.
    • getLowFreqBoost

      public float getLowFreqBoost()
      Gets the boost used for low frequency terms.
    • getHighFreqBoost

      public float getHighFreqBoost()
      Gets the boost used for high frequency terms.
    • toString

      public String toString(String field)
      Description copied from class: Query
      Prints a query to a string, with field assumed to be the default field and omitted.
      Specified by:
      toString in class Query
    • hashCode

      public int hashCode()
      Description copied from class: Query
      Override and implement query hash code properly in a subclass. This is required so that QueryCache works properly.
      Specified by:
      hashCode in class Query
      See Also:
    • equals

      public boolean equals(Object other)
      Description copied from class: Query
      Override and implement query instance equivalence properly in a subclass. This is required so that QueryCache works properly.

      Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical to those of the other instance. Utility methods are provided for certain repetitive code.

      Specified by:
      equals in class Query
      See Also:
    • equalsTo

      private boolean equalsTo(CommonTermsQuery other)
    • newTermQuery

      protected Query newTermQuery(Term term, TermStates termStates)
      Builds a new TermQuery instance.

      This is intended for subclasses that wish to customize the generated queries.

      Parameters:
      term - term
      termStates - the TermStates to be used to create the low level term query. Can be null.
      Returns:
      new TermQuery instance