class BioDSL::Assemble
Class with methods for assembling pair-end reads.
Public Class Methods
Method to initialize an Assembly object.
# File lib/BioDSL/seq/assemble.rb, line 48 def initialize(entry1, entry2, options) @entry1 = entry1 @entry2 = entry2 @overlap = 0 @offset1 = 0 @offset2 = 0 @options = options @options[:mismatches_max] ||= 0 @options[:overlap_min] ||= 1 check_options end
Class method to assemble two Seq
objects.
# File lib/BioDSL/seq/assemble.rb, line 42 def self.pair(entry1, entry2, options = {}) assemble = new(entry1, entry2, options) assemble.match end
Public Instance Methods
Calculate the diff between sequence lengths and return this.
@return [Fixnum] Diff.
# File lib/BioDSL/seq/assemble.rb, line 118 def calc_diff diff = @entry1.length - @entry2.length diff = 0 if diff < 0 diff end
Calculate the overlap to be matched.
# File lib/BioDSL/seq/assemble.rb, line 107 def calc_overlap @overlap = if @options[:overlap_max] [@options[:overlap_max], @entry1.length, @entry2.length].min else [@entry1.length, @entry2.length].min end end
Check option values are sane.
@raise [AssembleError] on bad values.
# File lib/BioDSL/seq/assemble.rb, line 64 def check_options if @options[:mismatches_max] < 0 fail AssembleError, "mismatches_max must be zero or greater - not: \ #{@options[:mismatches_max]}" end if @options[:overlap_max] && @options[:overlap_max] <= 0 fail AssembleError, "overlap_max must be one or greater - not: \ #{@options[:overlap_max]}" end if @options[:overlap_min] <= 0 fail AssembleError, "overlap_min must be one or greater - not: \ #{@options[:overlap_min]}" end end
Method to extract and downcase the left part of an assembled pair.
@return [BioDSL::Seq] Left part.
# File lib/BioDSL/seq/assemble.rb, line 127 def entry_left entry = @entry1[0...@offset1] entry.seq.downcase! entry end
Method to extract and upcase the overlapping part of an assembled pair.
@return [BioDSL::Seq] Overlapping part.
# File lib/BioDSL/seq/assemble.rb, line 150 def entry_overlap if @entry1.qual && @entry2.qual entry_overlap1 = @entry1[@offset1...@offset1 + @overlap] entry_overlap2 = @entry2[@offset2...@offset2 + @overlap] entry = merge_overlap(entry_overlap1, entry_overlap2) else entry = @entry1[@offset1...@offset1 + @overlap] end entry.seq.upcase! entry end
Method to extract and downcase the right part of an assembled pair.
@return [BioDSL::Seq] Right part.
# File lib/BioDSL/seq/assemble.rb, line 136 def entry_right entry = if @entry1.length > @offset1 + @overlap @entry1[@offset1 + @overlap..-1] else @entry2[@offset2 + @overlap..-1] end entry.seq.downcase! entry end
Method to locate overlapping matches between two sequences.
# File lib/BioDSL/seq/assemble.rb, line 82 def match calc_overlap diff = calc_diff @offset1 = @entry1.length - @overlap - diff while @overlap >= @options[:overlap_min] mismatches_max = (@overlap * @options[:mismatches_max] * 0.01).round if (mismatches = match_C(@entry1.seq, @entry2.seq, @offset1, @offset2, @overlap, mismatches_max)) && mismatches >= 0 entry_merged = entry_left + entry_overlap + entry_right entry_merged.seq_name = @entry1.seq_name + ":overlap=#{@overlap}:hamming=#{mismatches}" if @entry1.seq_name return entry_merged end diff > 0 ? diff -= 1 : @overlap -= 1 @offset1 += 1 end end
Method to merge sequence and quality scores in an overlap. The residue with the highest score at mismatch positions is selected. The quality scores of the overlap are the mean of the two sequences.
# File lib/BioDSL/seq/assemble.rb, line 167 def merge_overlap(entry_overlap1, entry_overlap2) na_seq = NArray.byte(entry_overlap1.length, 2) na_seq[true, 0] = NArray.to_na(entry_overlap1.seq.downcase, 'byte') na_seq[true, 1] = NArray.to_na(entry_overlap2.seq.downcase, 'byte') na_qual = NArray.byte(entry_overlap1.length, 2) na_qual[true, 0] = NArray.to_na(entry_overlap1.qual, 'byte') na_qual[true, 1] = NArray.to_na(entry_overlap2.qual, 'byte') mask_xor = na_seq[true, 0] ^ na_seq[true, 1] > 0 mask_seq = ((na_qual * mask_xor).eq((na_qual * mask_xor).max(1))) merged = Seq.new merged.seq = (na_seq * mask_seq).max(1).to_s merged.qual = na_qual.mean(1).round.to_type('byte').to_s merged end