bm25 {superml} | R Documentation |
Best Matching(BM25) - Deprecated
Description
Computer BM25 distance between sentences/documents.
Details
BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation uses multiple cores for faster and parallel computation.
Public fields
corpus
a list containing sentences
use_parallel
enables parallel computation, defaults to FALSE
Methods
Public methods
Method new()
Usage
bm25$new(corpus, use_parallel)
Arguments
corpus
list, a list containing sentences
use_parallel
logical, enables parallel computation, defaults to FALSE. if TRUE uses n - 1 cores.
Details
Create a new 'bm25' object.
Returns
A 'bm25' object.
example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') obj <- bm25$new(example, use_parallel=FALSE)
Method most_similar()
Usage
bm25$most_similar(document, topn = 1)
Arguments
document
character, for this value we find most similar sentences.
topn
integer, top n sentences to retrieve
Details
Returns a list of the most similar sentence
Returns
a vector of most similar documents
example <- c('white audi 2.5 car','black shoes from office', 'new mobile iphone 7','audi tyres audi a3', 'nice audi bmw toyota corolla') get_bm <- bm25$new(example, use_parallel=FALSE) input_document <- c('white toyota corolla') get_bm$most_similar(document = input_document, topn = 2)
Method clone()
The objects of this class are cloneable with this method.
Usage
bm25$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.