class BioDSL::SliceSeq

Slice sequences in the stream and obtain subsequences.

Slice subsequences from sequences using index positions, that is single postion residues, or using ranges for stretches of residues.

All positions are 0-based.

If the records also contain quality SCORES these are also sliced.

Usage

slice_seq(<slice: <index>|<range>>)

Options

Examples

Consider the following FASTQ entry in the file test.fq:

@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGGGCGAT
+
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI

To slice the second residue from the beginning do:

BD.new.read_fastq(input: "test.fq").slice_seq(slice: 2).dump.run

{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
 :SEQ=>"G",
 :SEQ_LEN=>1,
 :SCORES=>"#"}

To slice the last residue do:

BD.new.read_fastq(input: "test.fq").slice_seq(slice: -1).dump.run

{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
 :SEQ=>"T",
 :SEQ_LEN=>1,
 :SCORES=>"I"}

To slice the first 5 residues do:

BD.new.read_fastq(input: "test.fq").slice_seq(slice: 0 ... 5).dump.run

{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
 :SEQ=>"TTGGT",
 :SEQ_LEN=>5,
 :SCORES=>"!\"\#$%"}

To slice the last 5 residues do:

BD.new.read_fastq(input: "test.fq").slice_seq(slice: -5 .. -1).dump.run

{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
 :SEQ=>"GCGAT",
 :SEQ_LEN=>5,
 :SCORES=>"EFGHI"}

Constants

STATS

Public Class Methods

new(options) click to toggle source

Constructor for SliceSeq.

@param options [Hash] Options hash. @option options [Range,Integer] :slice

@return [SliceSeq] Class instance.

# File lib/BioDSL/commands/slice_seq.rb, line 101
def initialize(options)
  @options = options

  check_options
end

Public Instance Methods

lmb() click to toggle source

Return lambda for command.

@return [Proc] Command lambda.

# File lib/BioDSL/commands/slice_seq.rb, line 110
def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      slice_seq(record) if record.key? :SEQ

      output << record

      @status[:records_out] += 1
    end
  end
end

Private Instance Methods

check_options() click to toggle source

Check options.

# File lib/BioDSL/commands/slice_seq.rb, line 129
def check_options
  options_allowed(@options, :slice)
  options_required(@options, :slice)
end
slice_seq(record) click to toggle source

Slice sequence in given record.

@param record [Hash] BioDSL record.

# File lib/BioDSL/commands/slice_seq.rb, line 137
def slice_seq(record)
  entry = BioDSL::Seq.new_bp(record)

  @status[:sequences_in] += 1
  @status[:residues_in] += entry.length

  entry = entry[@options[:slice]]

  @status[:sequences_out] += 1
  @status[:residues_out] += entry.length

  record.merge! entry.to_bp
end