class TextAlignment::GLCSAlignment
An instance of this class holds the results of generalized LCS computation for the two strings str1 and str2. an optional dictionary is used for generalized suffix comparision.
Attributes
the elements that are common in the two strings, str1 and str2
the string of non-mapped characters
The length of GLCS
the elements that are mapped to each other in the two strings, str1 and str2
The mapping function from str1 to str2
The mapping function from str1 to str2
The position initial and final position of matching on str1 and str2
The position initial and final position of matching on str1 and str2
The position initial and final position of matching on str1 and str2
The position initial and final position of matching on str1 and str2
Public Class Methods
It initializes the GLCS table for the given two strings, str1 and str2. When the array, mappings, is given, general suffix comparision is performed based on the mappings. Exception is raised when nil given passed to either str1, str2 or dictionary
# File lib/text_alignment/glcs_alignment.rb, line 32 def initialize(str1, str2, mappings = []) raise ArgumentError, "nil string" if str1 == nil || str2 == nil raise ArgumentError, "nil dictionary" if mappings == nil # index the mappings in hash. @dic = (mappings + mappings.map{|e| e.reverse}).to_h # prefix dictionary @pdic = Dictionary.new(mappings.flatten) @len1 = str1.length @len2 = str2.length # add a final marker to the end of the strings @str1 = str1 + '_' @str2 = str2 + '_' # compute the GLCS table @glcs = _compute_glcs_table @length = @glcs[0][0] _trace_glcs_table end
Public Instance Methods
Returns the character-by-character difference
# File lib/text_alignment/glcs_alignment.rb, line 66 def cdiff cdiff1, cdiff2 = '', '' p1, p2 = 0, 0 begin s1, s2 = _prefix_eq(@str1[p1...@len1], @str2[p2...@len2]) if s1 != nil l1, l2 = s1.length, s2.length cdiff1 += s1; cdiff2 += s2 if l1 > l2 then cdiff2 += ' ' * (l1 - l2) else cdiff1 += ' ' * (l2 - l1) end p1 += s1.length; p2 += s2.length elsif p2 < @len2 && (p1 == @len1 or @glcs[p1][p2 + 1] > @glcs[p1 + 1][p2]) cdiff1 += ' ' cdiff2 += @str2[p2] p2 += 1 elsif p1 < @len1 && (p2 == @len2 or @glcs[p1][p2 + 1] <= @glcs[p1 + 1][p2]) cdiff1 += @str1[p1] cdiff2 += ' ' p1 += 1 end end until p1 == @len1 && p2 == @len2 return [cdiff1, cdiff2] end
Prints the GLCS table
# File lib/text_alignment/glcs_alignment.rb, line 57 def show_glcs puts "\t\t" + @str2.split(//).join("\t") @glcs.each_with_index do |row, i| h = (@str1[i].nil?)? '' : @str1[i] puts i.to_s + "\t" + h + "\t" + row.join("\t") end end
# File lib/text_alignment/glcs_alignment.rb, line 109 def transform_a_span(span) {:begin=>@position_map_begin[span[:begin]], :end=>@position_map_end[span[:end]]} end
# File lib/text_alignment/glcs_alignment.rb, line 113 def transform_spans(spans) spans.map{|span| transform_a_span(span)} end
Private Instance Methods
Computes the GLCS table for the two strings, @str1 and @str2. Unlike normal LCS algorithms, the computation is performed from the end to the beginning of the strings.
# File lib/text_alignment/glcs_alignment.rb, line 122 def _compute_glcs_table glcs = Array.new(@len1 + 1) { Array.new(@len2 + 1) } # initialize the final row and the final column (0..@len1).each {|p| glcs[p][@len2] = 0} (0..@len2).each {|p| glcs[@len1][p] = 0} # compute the GLCS table str1_reverse_iteration = (0...@len1).to_a.reverse str2_reverse_iteration = (0...@len2).to_a.reverse str1_reverse_iteration.each do |p1| str2_reverse_iteration.each do |p2| s1, s2 = _prefix_eq(@str1[p1...@len1], @str2[p2...@len2]) unless s1 == nil glcs[p1][p2] = glcs[p1 + s1.length][p2 + s2.length] + 1 else glcs[p1][p2] = (glcs[p1][p2 + 1] > glcs[p1 + 1][p2])? glcs[p1][p2 + 1] : glcs[p1 + 1][p2] end end end glcs end
General prefix comparison is performed based on the dictionary. The pair of matched suffixes are returned when found. Otherwise, the pair of nil values are returned.
# File lib/text_alignment/glcs_alignment.rb, line 229 def _prefix_eq(str1, str2) return nil, nil if str1.empty? || str2.empty? prefixes1 = @pdic.prefixes(str1) prefixes1.each {|p1| p2 = @dic[p1]; return p1, p2 if str2.start_with?(p2)} return str1[0], str2[0] if (str1[0] == str2[0]) return nil, nil end
Backtrace the GLCS table, computing the mapping function from str1 to str2 As its side effect, it updates four global variables
-
front_overflow: the length of the front part of str1 that cannot fit in str2.
-
rear_overflow: the length of the rear part of str1 that cannot fit in str2.
-
common_elements
: an array which stores the common elements in the two strings. -
mapped_elements
: an array which stores the mapped elements in the two strings.
# File lib/text_alignment/glcs_alignment.rb, line 153 def _trace_glcs_table @front_overflow, @rear_overflow = 0, 0 @common_elements, @mapped_elements = [], [] diff_string1, diff_string2 = '', '' @position_map_begin, @position_map_end = {}, {} addition, deletion = [], [] p1, p2 = 0, 0 while p1 <= @len1 && p2 <= @len2 s1, s2 = _prefix_eq(@str1[p1..@len1], @str2[p2..@len2]) if s1 != nil l1, l2 = s1.length, s2.length @position_map_begin[p1], @position_map_end[p1] = p2, p2 (p1 + 1 ... p1 + l1).each{|i| @position_map_begin[i], @position_map_end[i] = nil, nil} @common_elements << [s1, s2] if !addition.empty? && deletion.empty? # If an addition is found in the front or the rear, it is a case of underflow @str2_match_begin = addition.length if p1 == 0 @str2_match_end = l2 - addition.length if p1 == @len1 if p1 == 0 # leave as it is elsif p1 == @len1 # retract from the end @position_map_begin[p1] = p2 - addition.length @position_map_end[p1] = @position_map_begin[p1] else # correct the position for end @position_map_end[p1] = p2 - addition.length end elsif addition.empty? && !deletion.empty? # If a deletion is found in the front or the rear, it is a case of overflow @str1_match_begin = deletion.length if p1 == deletion.length @str1_match_end = l1 - deletion.length if p1 == @len1 deletion.each{|p| @position_map_begin[p], @position_map_end[p] = p2, p2} elsif !addition.empty? && !deletion.empty? # If an addition and a deletion are both found in the front or the rear, # the overflow/underflow is approximated to the difference. al, dl = addition.length, deletion.length @front_overflow = dl - al if p1 == dl @rear_overflow = dl - al if p1 == @len1 @mapped_elements << [@str1[deletion[0], dl], @str2[addition[0], al]] @position_map_begin[deletion[0]], @position_map_end[deletion[0]] = addition[0], addition[0] deletion[1..-1].each{|p| @position_map_begin[p], @position_map_end[p] = nil, nil} end addition.clear; deletion.clear p1 += l1; p2 += l2 elsif p2 < @len2 && (p1 == @len1 || @glcs[p1][p2 + 1] > @glcs[p1 + 1][p2]) diff_string2 += @str2[p2] addition << p2 p2 += 1 elsif p1 < @len1 && (p2 == @len2 || @glcs[p1][p2 + 1] <= @glcs[p1 + 1][p2]) diff_string1 += @str1[p1] deletion << p1 p1 += 1 end end @common_elements.pop @diff_strings = [diff_string1, diff_string2] end