class BioDSL::MergePairSeq

Merge pair-end sequences in the stream.

merge_pair_seq merges paired sequences in the stream, if these are interleaved. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. Sequence names must match accordingly in order to merge sequences.

Usage

merge_pair_seq

Options

Examples

Consider the following FASTQ entry in the file test.fq:

@M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
TGGGGAATATTGGACAATGG
+
<??????BDDDDDDDDGGGG
@M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
CCTGTTTGCTACCCACGCTT
+
?????BB<-<BDDDDDFEEF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
TAGGGAATCTTGCACAATGG
+
<???9?BBBDBDDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
ACTCTTCGCTACCCATGCTT
+
,5<??BB?DDABDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
TAGGGAATCTTGCACAATGG
+
?????BBBBBDDBDDBFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
CCTCTTCGCTACCCATGCTT
+
??,<??B?BB?BBBBBFF?F

To merge these interleaved pair-end sequences use merge_pair_seq:

BD.new.
read_fastq(input: "test.fq", encoding: :base_33).
merge_pair_seq.
dump.
run

{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14",
 :SEQ=>"TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT",
 :SEQ_LEN=>40,
 :SCORES=>"<??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF",
 :SEQ_LEN_LEFT=>20,
 :SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14",
 :SEQ=>"TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT",
 :SEQ_LEN=>40,
 :SCORES=>"<???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF",
 :SEQ_LEN_LEFT=>20,
 :SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14",
 :SEQ=>"TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT",
 :SEQ_LEN=>40,
 :SCORES=>"?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F",
 :SEQ_LEN_LEFT=>20,
 :SEQ_LEN_RIGHT=>20}

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for MergePairSeq.

@param options [Hash] Options hash.

@return [MergePairSeq] Instance of MergePairSeq.

# File lib/BioDSL/commands/merge_pair_seq.rb, line 106
def initialize(options)
  @options = options

  check_options
end

Public Instance Methods

lmb() click to toggle source

Return the command lambda for merge_pair_seq.

@return [Proc] Command lambda for.

# File lib/BioDSL/commands/merge_pair_seq.rb, line 115
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each_slice(2) do |record1, record2|
      @status[:records_in] += record2 ? 2 : 1

      if record1[:SEQ] && record2[:SEQ]
        output << merge_pair_seq(record1, record2)

        @status[:sequences_in] += 2
        @status[:sequences_out] += 1
        @status[:records_out] += 1
      else
        output.puts record1, record2

        @status[:records_out] += 2
      end
    end
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/merge_pair_seq.rb, line 140
def check_options
  options_allowed(@options, nil)
end
merge_pair_seq(record1, record2) click to toggle source

Merge entry pair and return a new BioDSL record with this.

@param record1 [Hash] BioDSL record 1. @param record2 [Hash] BioDSL record 2.

@return [Hash] BioDSL record.

# File lib/BioDSL/commands/merge_pair_seq.rb, line 150
def merge_pair_seq(record1, record2)
  entry1 = BioDSL::Seq.new_bp(record1)
  entry2 = BioDSL::Seq.new_bp(record2)

  BioDSL::Seq.check_name_pair(entry1, entry2)

  @status[:residues_in] += entry1.length + entry2.length

  length1 = entry1.length
  length2 = entry2.length

  entry1 << entry2

  @status[:residues_out] += entry1.length

  new_record(entry1, length1, length2)
end
new_record(entry1, length1, length2) click to toggle source
# File lib/BioDSL/commands/merge_pair_seq.rb, line 168
def new_record(entry1, length1, length2)
  new_record = entry1.to_bp
  new_record[:SEQ_LEN_LEFT]  = length1
  new_record[:SEQ_LEN_RIGHT] = length2
  new_record
end