Class BM25NBClassifier

java.lang.Object
org.apache.lucene.classification.BM25NBClassifier
All Implemented Interfaces:
Classifier<BytesRef>

public class BM25NBClassifier extends Object implements Classifier<BytesRef>
A classifier approximating naive bayes classifier by using pure queries on BM25.
  • Field Details

    • indexReader

      private final IndexReader indexReader
      IndexReader used to access the Classifier's index
    • textFieldNames

      private final String[] textFieldNames
      names of the fields to be used as input text
    • classFieldName

      private final String classFieldName
      name of the field to be used as a class / category output
    • analyzer

      private final Analyzer analyzer
      Analyzer to be used for tokenizing unseen input text
    • indexSearcher

      private final IndexSearcher indexSearcher
      IndexSearcher to run searches on the index for retrieving frequencies
    • query

      private final Query query
      Query used to eventually filter the document set to be used to classify
  • Constructor Details

    • BM25NBClassifier

      public BM25NBClassifier(IndexReader indexReader, Analyzer analyzer, Query query, String classFieldName, String... textFieldNames)
      Creates a new NaiveBayes classifier.
      Parameters:
      indexReader - the reader on the index to be used for classification
      analyzer - an Analyzer used to analyze unseen text
      query - a Query to eventually filter the docs used for training the classifier, or null if all the indexed docs should be used
      classFieldName - the name of the field used as the output for the classifier NOTE: must not be heavely analyzed as the returned class will be a token indexed for this field
      textFieldNames - the name of the fields used as the inputs for the classifier, NO boosting supported per field
  • Method Details