class EncodingEstimator::Distribution
Public Class Methods
new( language )
click to toggle source
Create a new distribution object for a given language
@param [EncodingEstimator::LanguageModel] language Language to load the distribution for
# File lib/encoding_estimator/distribution.rb, line 10 def initialize( language ) @@distributions[ language.path ] ||= load_language language @distribution = @@distributions[ language.path ] end
Public Instance Methods
evaluate( str, penalty )
click to toggle source
Calculate the likelihood of a string for the given language
@param [String] str Data to calculate the likelihood for @param [Float] penalty Threshold which defines when chars are weighted negative (-> calc score - thresh) @return [Float] Total likelihood
# File lib/encoding_estimator/distribution.rb, line 21 def evaluate( str, penalty ) dist = @distribution sum = 0.0 str.each_char { |c| sum += dist.fetch( c, 0.0 ) - penalty } sum end
Private Instance Methods
load_language( language )
click to toggle source
Try to load the language from filesystem
@param [EncodingEstimator::LanguageModel] language 2-letter-symbol indicating the language to load @return [Hash] Hash representing the distribution for a language
# File lib/encoding_estimator/distribution.rb, line 35 def load_language( language ) return {} unless language.valid? begin distribution = JSON.parse( File.read( language.path, encoding: 'utf-8' ) ) rescue Exception distribution = {} end distribution end