class BioDSL::MergePairSeq
Merge pair-end sequences in the stream.¶ ↑
merge_pair_seq
merges paired sequences in the stream, if these are interleaved. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. Sequence names must match accordingly in order to merge sequences.
Usage¶ ↑
merge_pair_seq
Options¶ ↑
Examples¶ ↑
Consider the following FASTQ entry in the file test.fq:
@M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14 TGGGGAATATTGGACAATGG + <??????BDDDDDDDDGGGG @M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14 CCTGTTTGCTACCCACGCTT + ?????BB<-<BDDDDDFEEF @M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14 TAGGGAATCTTGCACAATGG + <???9?BBBDBDDBDDFFFF @M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14 ACTCTTCGCTACCCATGCTT + ,5<??BB?DDABDBDDFFFF @M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14 TAGGGAATCTTGCACAATGG + ?????BBBBBDDBDDBFFFF @M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14 CCTCTTCGCTACCCATGCTT + ??,<??B?BB?BBBBBFF?F
To merge these interleaved pair-end sequences use merge_pair_seq
:
BD.new. read_fastq(input: "test.fq", encoding: :base_33). merge_pair_seq. dump. run {:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14", :SEQ=>"TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT", :SEQ_LEN=>40, :SCORES=>"<??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF", :SEQ_LEN_LEFT=>20, :SEQ_LEN_RIGHT=>20} {:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14", :SEQ=>"TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT", :SEQ_LEN=>40, :SCORES=>"<???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF", :SEQ_LEN_LEFT=>20, :SEQ_LEN_RIGHT=>20} {:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14", :SEQ=>"TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT", :SEQ_LEN=>40, :SCORES=>"?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F", :SEQ_LEN_LEFT=>20, :SEQ_LEN_RIGHT=>20}
Constants
- STATS
Public Class Methods
new(options)
click to toggle source
Constructor for MergePairSeq
.
@param options [Hash] Options hash.
@return [MergePairSeq] Instance of MergePairSeq
.
# File lib/BioDSL/commands/merge_pair_seq.rb, line 106 def initialize(options) @options = options check_options end
Public Instance Methods
lmb()
click to toggle source
Return the command lambda for merge_pair_seq.
@return [Proc] Command
lambda for.
# File lib/BioDSL/commands/merge_pair_seq.rb, line 115 def lmb lambda do |input, output, status| status_init(status, STATS) input.each_slice(2) do |record1, record2| @status[:records_in] += record2 ? 2 : 1 if record1[:SEQ] && record2[:SEQ] output << merge_pair_seq(record1, record2) @status[:sequences_in] += 2 @status[:sequences_out] += 1 @status[:records_out] += 1 else output.puts record1, record2 @status[:records_out] += 2 end end end end
Private Instance Methods
check_options()
click to toggle source
Check options.
# File lib/BioDSL/commands/merge_pair_seq.rb, line 140 def check_options options_allowed(@options, nil) end
merge_pair_seq(record1, record2)
click to toggle source
Merge entry pair and return a new BioDSL
record with this.
@param record1 [Hash] BioDSL
record 1. @param record2 [Hash] BioDSL
record 2.
@return [Hash] BioDSL
record.
# File lib/BioDSL/commands/merge_pair_seq.rb, line 150 def merge_pair_seq(record1, record2) entry1 = BioDSL::Seq.new_bp(record1) entry2 = BioDSL::Seq.new_bp(record2) BioDSL::Seq.check_name_pair(entry1, entry2) @status[:residues_in] += entry1.length + entry2.length length1 = entry1.length length2 = entry2.length entry1 << entry2 @status[:residues_out] += entry1.length new_record(entry1, length1, length2) end
new_record(entry1, length1, length2)
click to toggle source
# File lib/BioDSL/commands/merge_pair_seq.rb, line 168 def new_record(entry1, length1, length2) new_record = entry1.to_bp new_record[:SEQ_LEN_LEFT] = length1 new_record[:SEQ_LEN_RIGHT] = length2 new_record end