class Ai::Nlp::Languages

Class to handle multiple languages

Public Class Methods

new() click to toggle source

Initialisation

# File lib/ai/nlp/languages.rb, line 14
def initialize
  @n_gram = NGram.new
end

Public Instance Methods

all() click to toggle source

Returns the currently known languages @return An array of Language

# File lib/ai/nlp/languages.rb, line 21
def all
  @languages = Language.all
end
create_one(name, input) click to toggle source

Create a new language. @param string name The language name. @param string input The initial data set. @return La langue créée.

# File lib/ai/nlp/languages.rb, line 40
def create_one(name, input)
  language = Language.new(name: name)
  language.update(map: @n_gram.calculate(input).to_h)
end
guess(input) click to toggle source

Offers among the available languages the closest one to the datasets @param string input The data set.

# File lib/ai/nlp/languages.rb, line 28
def guess(input)
  all
  return [] if @languages.empty?
  hash = @languages.map { |language| [language, score(input, language)] }.to_h
  sort(hash)
end

Private Instance Methods

add_frequency(input_gram_freq, pos, point) click to toggle source

Add frequency if needed @param integer input_gram_freq The input gram frequency @param integer pos The position in the max_compare @param integer point The current calculated points

# File lib/ai/nlp/languages.rb, line 89
def add_frequency(input_gram_freq, pos, point)
  point += (input_gram_freq - pos).abs if input_gram_freq
  point
end
calculate_point(max_compare, ngram, input_gram) click to toggle source

Calculates the new frequency @return le score (point)

# File lib/ai/nlp/languages.rb, line 74
def calculate_point(max_compare, ngram, input_gram)
  point = 0
  (0..max_compare).each do |pos|
    position = input_gram[pos]
    next unless position
    point = add_frequency(ngram[position[0]], pos, point)
  end
  point
end
reject(sorted_languages, hash) click to toggle source
# File lib/ai/nlp/languages.rb, line 56
def reject(sorted_languages, hash)
  sorted_languages.reject { |language| hash[language].zero? }
end
score(input, language) click to toggle source

Compare a string of characters against a language based on, at most, the 400 most commonly used groups of letters. @param string input The data set to compare @param Language language The Language to compare to

# File lib/ai/nlp/languages.rb, line 65
def score(input, language)
  input_gram = @n_gram.calculate(input)
  ngram = language.map
  calculate_point([input_gram.size, 400].min, ngram, input_gram)
end
sort(hash) click to toggle source

Sort the language hash @param hash hash The language hash @return the sorted list of languages

# File lib/ai/nlp/languages.rb, line 51
def sort(hash)
  sorted_languages = @languages.sort_by { |language| hash[language] }
  reject(sorted_languages, hash)
end