class Opener::LanguageIdentifier::Backend::LanguageDetection
Constants
- DEFAULT_PRIORITY
The default priority for non OpeNER languages.
@return [Float]
- DEFAULT_PROFILES_PATH
Path to the directory containing the default profiles.
@return [String]
- DEFAULT_SHORT_PROFILES_PATH
Path to the directory containing the default short profiles.
@return [String]
- PRIORITIES
Prioritize OpeNER languages over the rest. Languages not covered by this list are automatically given a default priority.
@return [Hash]
- SHORT_THRESHOLD
The amount of characters after which the detector should switch to using the longer profiles set.
@return [Fixnum]
Public Class Methods
# File lib/opener/language_identifier/backend/language_detection.rb, line 62 def initialize @factory = com.cybozu.labs.langdetect.DetectorFactory.new end
Public Instance Methods
@return [String]
# File lib/opener/language_identifier/backend/language_detection.rb, line 81 def detect input detector = new_detector input detector.detect # The core Java code raise an exception when it can't detect a language. # Since this isn't actually something fatal we'll capture this and return # "unknown" instead. rescue com.cybozu.labs.langdetect.LangDetectException return 'unknown' end
# File lib/opener/language_identifier/backend/language_detection.rb, line 66 def new_detector input @factory.load_profile determine_profiles input @factory.set_seed 1 priorities = build_priorities input, @factory.langlist detector = com.cybozu.labs.langdetect.Detector.new @factory detector.set_prior_map priorities detector.append input.downcase detector end
Protected Instance Methods
Builds a Java Hash mapping the priorities for all OpeNER and non OpeNER languages.
If the input size is smaller than the short profiles threshold non OpeNER languages are disabled. This is to ensure that these languages are detected properly when analysing only 1-2 words.
@param [String] input @param [Array<String>] languages @return [java.util.HashMap]
# File lib/opener/language_identifier/backend/language_detection.rb, line 106 def build_priorities input, languages priorities = java.util.HashMap.new priority = if short_input? input then 0.0 else DEFAULT_PRIORITY end PRIORITIES.each do |lang, val| priorities.put(lang, val) end languages.each do |language| unless priorities.contains_key(language) priorities.put(language, priority) end end priorities end
@param [String] input @return [String]
# File lib/opener/language_identifier/backend/language_detection.rb, line 127 def determine_profiles input if short_input? input then DEFAULT_SHORT_PROFILES_PATH else DEFAULT_PROFILES_PATH end end
@param [String] input @return [TrueClass|FalseClass]
# File lib/opener/language_identifier/backend/language_detection.rb, line 135 def short_input? input input.length <= SHORT_THRESHOLD end